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FOREWORD 


This  report  develops  various  aspects  of  a 
new  method  of  treating  extreme-value  data®  This  method, 
based  on  order  statistics,  has  a number  of  advantages 
over  existing  procedures,  and  will  be  useful  in  the 
efficient  handling  of  small  or  large  sets  of  extreme 
gust°load  data  and  extreme  data  in  many  other  fields 
as  well e 

Ihe  present  report  completes  work  under  a 
research  project  aimed  at  the  improved  application  of 
the  theory  of  extreme  values  to  the  analysis  of  gust 
loads  of  airplanes*  This  project,  supported  by  the 
National  Advisory  Committee  for  Aeronautics,  was  carried 
out  by  Julius  Lieblein  under  the  general  supervision 
of  Dr„  Churchill  Eisenhart,  Chief  of  the  Statistical 
Engineering  Laboratory,,  The  Statistical  Engineering 
Laboratory  is  Section  11*3  of  the  National  Applied 
Mathematics  Laboratories  (Division  11,  National  Bureau  of 
Standards),  and  is  concerned  with  the  development  and 
application  of  modern  statistical  methods  in  the  physical 
sciences  and  engineering* 

J,  Ho  Curtiss 

Chief,  National  Applied 

Mathematics  Laboratories 

A.  Vo  As tin 
Director 

National  Bureau  of  Standards 
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A NEW  METHOD  OF  ANALYZING  EXTREME -VALUE  DATA* 

by 

. Julius  Lieblein 
1 o SUMMARY 

A new  method  is  presented  and  proposed  for 
analyzing  extreme -value  data  which  may  arise  in  a 
wide  variety  of  applications <> 

Classical  applications  of  statistical  methods, 
which  usually  concern  average  values,  are  inadequate 
when  the  quantity  of  interest  is  the  largest  (or 
smallest)  in  a set  of  magnitudes,,  This  is  the  sit<= 
uation  in  a number  of  fields,  e0g„,  gust  loads  of 
an  airplane  in  flight,  highest  temperatures  or  low- 
est pressures  in  meteorology,  floods  and  droughts 
in  hydrology*  creaking  strengths  in  materials  test- 
ing, breakdown  voltage  of  capacitors,  and  hpman  life 
spans,  in  all  of  which  applications  of  methbds  for 
dealing  with  extremes  have  already  been  made 0 

Discussion  of  the  proposed  method  is  preceded 
by  the  necessary  statistical  theory  (Section  4)  which 
also  furnishes  a basis  for  evaluating  the  new  method 
in  relation  to  existing  ones„  The  techniques  des- 
cribed provide  a simple  means  for  estimating  the 

*•  Preparation  of“this~report  was  sponsored  by  the 
National  Advisory  Committee  for  Aenonautics0 
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necessary  parameters*  making  predictions  from  the 
fitted  curvej  estimating  the  reliabili ty*  and  eval° 
uating  the  efficiency  of  the  method  in  relation  to 
other  methodSo  Moreover*  these  quantities  are  all 
produced  by  a single  set  of  computations  involving 
just  two  worksheets 0 This  background  material  is 
not  essential  to  an  application  of  the  method*  and 
may  be  omitted  if  desiredo  The  method  itself  is 
summarised  for  practical  convenience  and  illustrated 
s tep-by-s tep  in  Section  5 5 and  compared  with  present 
procedures  in  Section  6»  The  latter  section  also 

1 

discusses  the  advantages  of  the  proposed  method* 
chief  among  which  ares 

(1)  For  the  first  time  there  is  available 
an  unbiased  estimator  of  known 

effi ciencyl 

(2)  The  proposed  estimator  appears  to  be 
more  efficient  than  a simplified  form 
of  the  Gumbel  estimator  in  many 
practical  cases*  namely*  for  samples 

The  te chni c al  terms  used  here  are  defined  and  dis- 
cussed in  the  main  text*  and  they  can  be  located 
with  the  aid  of  the  list  of  symbols  in  Section  3® 
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of  about  20  or  move s and  P = „95> 
or  more.  The  improvement  in  efficien- 
cy  increases  with  increasing  P or 
increasing  sample  size„  When  com- 
pared with  the  original  Gumbel 
estimator,,  the  proposed  one  is  up 
to  twice  as  efficient,, 

(3)  The  confidence  intervals  have  a more 
valid  theoretical  basis  and  are 
in  many  cases  narrower  than  the 
ones  in  the  Gumbel  method,, 

Included  in  the  report  are  several  appendices 
giving  mathematical  developments  not  given  in 


the  text 


1 
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20  INTRODUCTION 


The  statistical  theory  of  extreme  values  has 
been  found  to  have  wide  applicability  in  many  diverse 
fields,  for  example,  meteorological  extremes,  floods, 
drought’s,  breaking  strength  of  textiles  and  other 
types  of  material?,  span  of  human  life,  gust  loads 
experienced  by  an  airplane  in  flight,  breakdown 
voltage  of  capaci torso 

I 

The  two  existing  methods  of  analyzing  extreme- 
value  data  have  several  limitations,  discussed  in 
the  body  of  this  reportc  One  of  these  methods  is 
known  as  the  method  of  maximum  likelihood  and  has 
been  described  by  Bc  F«  Kimball  (references  9?  10) 0 
The  other,  the  method  of  molnents,  has  been  developed 
by  Eo  Jo  Gumbel  (references  2,  3*  6),  and  its  application 
to  gust-load  problems  has  been  discussed. in  detail 
in  a previous  NAGA  Technical  Note  by  Harry  Press 
(reference  18)0 


...  . 

. 

. . . 
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The  present  report  gives  a new  method  for 
dealing  with  the  problem  of  analyzing  extreme 
measurements,  treated  ih  the  above  Technical  Note, 
which  has  certain  advantages  over  the  existing 
methods o The  method  of  application  is  presented 
in  detail,  together  with  the  necessary  worksheets 
and  other  data,  and  the  new  method  is  compared  with 
the  method  of  moments  previously  in  use.  For 
definiteness,  the  discussion  is  at  time  presented 
in  terms  of  application  to  gust  loads,  but  the 
methods  are  also  applicable  to  other  fields  where 
extreme  values  occur c 


3»  PRINCIPAL  SYMBOLS  (Listed  Alphabetically) 


Note?  By  ^samples**  are  meant  independent 
random  samples  from  the  extreme** 


value 

dis  tribution 

Equation  no . , 
etp  0 

9 9 

X “ l^pcoo^n 

Numerical  quantities 
entering  Into  weights 
of  order-statistics 
estimator  for  sample 
of  n 

(4.17)1 
Table  I 

cov(yps)  or 
o(y„s) 

Covariance  of  mean 
and  standard  devi- 
ation in  samples  of 
n from  reduced  distri- 
bution 

Page  110 
(Appendix  E) 

E ( * ® e ) 

Mathematical  expecta- 
tion (or  mean  value) 
Of  ( * *;«  ) 

Various*, 
e.g.,  (4.6) 

E „ E 
m*  n 

Efficiency  of  order- 
statistics  estimator 
for  subgroups'  of  m 
observations , or  for 
samples  of  n 

(4.24) , (4-19) 
Table  m , 
Part  B 

E (s ) 

Mean  Value  of  stand-' 
ard  deviation  in 
samples  of  n from 
reduced  distribution 

Page  109 
(Appendix  E) 

F(x)  = F(xfu*>(3  )- 
expt-e-  (x'u)/P] 

Probability  (emu- 
lative) distribution 
function  (c<,d.f »)  of 
extreme-value  distri- 
bution with  two  param- 
eters 

(4.D 

f(X)  = ^ 

Density  (or  frequency) 
function  of  extreme- 
value  distribution 
F(x) 

Page  11; 
Figure  1 

• - 
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k Number  of  equal  sub-  Pages  33 * 36 

groups  of  size  m 
contained  in  sample 
of  n. 


m 

Size  of  one  of  the  k 
equal  subgroups  con- 
tained in  sample  of  n 

Pages  33,  36 

m8 

Size  of  remainder  sub- 
group in  sample  of  n 
that  is  left  after  k 
equal  subgroups  of  m 
are  taken 9 i0e0s 
n = km  + m8 

Page  36 

MSE ( . . . ) 

Mean  square  error  of 
(«..)>  equals  variance 
plus  square  of  bias 

(4-13) 

n 

Sample  size  (N  de- 
notes sample  size  in 
Gumbel  method) > 

Page  23 

P 

Probability  level 
associated  with  a 
predicted  value 

Page  13 

Qo 

Numerator  in  Cramer- 
Rao  lower  bound! 

= Q°/n 

(4-24) ! 

Table  III, 
Part  A. 

^LB 

Cramer-Rao  lower 
bound  to  variance  of 
unbiased  estimator  of 
parameter  (see 

Qo> 

Page  26 

%i9  Qn 

Variance  of  order- 
statistics  Hsub» 
estimator”  for  sub- 
groups of.m,  or  of 
estimator  for  sample 
of  n„ 

(4*23><r 

(4.i8; 

8 


r Rank  of  r-th  observ 

vation  (counted  from 
smallest)  in  samples 
of  n whqn  arranged  in 
ascending  order  from 
Smallest  to  largest 
observation 


h(t1st2)  = 

MSE(T2) 
MSE  ( T ) 


n'/ 

1 2 (y. 

n i=l  ] 


■y)‘ 


fjtelatiye  efficiency  of 
es  timator-x  to  T2 

(greater  than  unity 
when  T-,  is  mor^ 
efficient) 

Standard  deviation 
(sodo)  of  sample  of 
n from  reduced  dis- 
tribution 


2 (xo =x)^ 

i=l 


S0d9  of  sample  of  n 
original  distri- 
bution 


^,1=1,2 


^ 0 O O £ 


k 


k 

2 To 

i=l 


Order  statistics 
w sub-estimator”  for 
i=  th  of  the  k equal- 
size  subgroups  in 
samples  of  n 

Average  of  the  sub- 
estimators for  the  k 
equal-size  subgroups 


T°  Sub-estimator  for 

remainder  subgroup 
(see  m® ) 


t,t» 


Weights  for  T and  T® 
in  grand  estimator 
£or  samples 
= tT  + t s T® 

original 

Gumbel  ® s/es  timator  of 
mode  u for  sample  of  n 


Page  54 

(E.7) 

Page  109 
(Appendix  E) 

(4.11) 

(4.21) 

(4o22) 

(4.25) 

( 4 . 26 )  » 

(It. 27) 

(E.l) 


c > 


9 


! = X -V^ 


x "^^x 


Simplified  expression 
used  to  represent 
Gumbel 8 s estimator  of 
mode  u 


(E.2) 


u 


Mode  or  location 
parameter  of  extreme- 
value  distribution 

Random  variable 
(^unreduced” ) 
having  extreme-valiie 
distribution  F(x) 


Page  12 


(4a) 

(4.5) 


xl*x2* ° ° ° s^n 


Xi  , X , X , j 

An  pr  Vn 
0<  /(  <ju<  V <1 


7s  Jj 

P 


The  n order-statistics  Page  23 
in  sample  of  n,  i,e#, 
the  observations  ranked 
in  ascending  order 

Three  selected  order-  Page  86 

statistics  in  Mosteller  (Appendix  C) 
method  for  very  large 
samples  of  n 

Sample  mean  in  sample  Page  20 
from  original  (Mun= 
reduced’®)  distribution 

Reduced  variate  Page  11+ 

Scale  parameter  of  Page  12 

extreme -value  dis- 
tribution F(x) 


A s 

p 8 = s 
K H X 


{ = 0o57721, 
56649 


original 

Gumbel 8 s/es timator  of  (Eol) 
(31  for  sample  of  n 

Simplified  expression  (E0?) 
used  to  represent 
Gumbel 8 s estimator  of 

P 

Euler8s  constant  (I4.08) 


, 

. 


* 


10 


Ax,n  = lo1^ 


A 9 = 1.11+lBpP 


fi  = E(x) 


fi2  - o 

h 

A 

io 


o2(s) 


y 

s u + py. 


ax  ' 


f(x)  = “J 
J(y)  = exp(~e”^) 


Half-width  of  68% 
confidence  interval 
in  Gumbel  method 


(D.l) 


Half “Width  of  68  % (Doll) 

confidence  interval 
when  mqdified  by 
probability  factor 


Half “Width  of  68 Jo 
confidence  interval 
in  method  of  order 
statistics' 

First  moment  or 
mathematical  expecta- 
tion of  random  vari- 
able x 

Variance  of  reduced 
dis  tribution 

The  100P-percent 
point  of  the  extreme- 
value  distribution  F(x) 

Simplified  expression 
used  to  represent 
Gumbel  estimator  of 

ip 

Variance  of  standard 
deviation  in  samples 
of ^n  from  reduced  dis- 
tribution 


Population  variance  of  (4»7) 
x1*  y 

Plotting  position  of  Page  5>4 
r-th  observation  ranked 
from  smallest 


Table  VIII 

(4.6) 

(4*9) 

Page  ll| 

(E.3) 


Table  IX 


Cod 0f»  of  reduced 
extreme-value  distri- 
bution 


Page  112 
(Appendix  E) 


4<»  STATISTICAL  THEORY 


4.1  Ex treme-value  distribution  and 
meaning  of  parameters 0 

The  method  of  analysis  presented  herein  is 
based  upon  the  assumption  that  the  observed  maxima 
to  be  analyzed  are  independent  observations  from  a 
statistical  distribution  pf  the  form 

F(x)  = F(xsu,0t)  = exp(=e"(x“u)/!3)  c (4.1) 

This  is  the  cumulative  (or  ogive)  form  of  the  dis- 
tribution* which  expresses  the  chance  that  an  observed 
extreme  value  (gust  load*  for  example)  will  not 
exceed  x in  value.  The  more  familiar  concept  of 
frequency  or  density  function*  f(x)  - F'(x)*  for 
this  distribution  may  be  obtained  by  differentiation 
but  is  rather  cumbersome  (see  Appendix  A)  and  is  not 
needed  for  present  purposes®  The  general  shape  of 
the  density  function  f (x)  is  shown  in  Figure  10  The 
meaning  of  the  various  quantities  indicated  is  ex- 
plained below,  A more  detailed  graph  for  the  case 
where  the  parameters  are  u = 0*  [3  =1  (the  ‘‘reduced'*) 
extreme-value  distribution)  is  plotted  in  Figure  20 

The  distribution  (4»1)  has  been  studied  exten- 
sively by  Gumbel  (among  others)  (references  2*3*6)  and 
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is  known  as  the  asymptotic  distribution  of  largest 
values o We  shall  refer  to  it  briefly  as  the  ex treme- 
value  distribution,,  The  significance  of  the  term 
asymptotic"  is  as  follows  ft*  If  the  underlying  dis- 
tribution  of  all  (not  merely  the  largest)  gust  loads 
(e0goj,  effective  gust  velocity,  normal  acceleration) 
is  considered,  then  the  largest  values  in  repeated 
large  samples  from  this  distribution  have  a distri- 
bution of  their  own  which,  as  the  sample  size  becomes 
larger;  and  larger,  approaches  closer  and  closer  (in 
a certain  sense)  to  a limiting  distribution,.  This 
limiting  distribution  is,  according, to  evidence 

$T3  - • 

presented  in  reference  18,  of  the  form  (I4.0I),  with 
1/(3  replacing  the  parameter  a used  in  the  reference „ 
The  parameters  of  the  extreme -value  distribu- 
tion are  depicted  in  Figure  1„  The  quantity  u is 
the  mode  or  highest  point  of  the  (frequency)  dis- 
tribution,, The  quantity  (3  is  a scale  parameter, 
analogous  to  the  standard  deviation  a in  the  case 
of  the  normal  dis  tribution,,  In  fact,  (3  equals 
(about  3/i|)  times  the  standard  deviation  of  the 
extreme-value  distribution,. 

Although  the  two  parameters  u,  (3  completely 
specify  the  distribution,  it  is  desirable 
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to  introduce  another  quantity  j*  =u  + py  which  is  a 
linear  combination  of  the  parameters  u and  p 
(and  therefore*  since  we  shall  assign  known  values 
to  y*  itself  a parameter) * and  makes  it  possible  to 
estimate  u and  p simul tapeously*  rather  than  in  terms 
of  two  separate  problems 0 Thus  if  we  can  estimate 
as  a + by  with  a*  b known * then  we  can  read  off  at 
once  the  values  u = a*  p = bQ 

The  parameter^  has  another  highly  important 
meaning o In  Figure  1 the  area  P under  the  distribu- 
tion to  the  left  of  the  ordinate  erected  at  J repre- 
sents the  probability  that  a value  larger  than  ^ will 
not  occure  If  J is  very  large*  then  P very  nearly 
equals  the  whole  area?  unity*  which  means  an  obser- 
vation is  almost  certain  not  to  exceed  fg  j in  other 
words*  a larger  value  of  i will  occur  only  very  rarely,, 
Thus  if  P = o99i>  then  the  corresponding  value  of 
has  a chance  of  only  O01  of  being  exceeded 0 To 
denote  this  dependence  of/  upon  the  probability 


-:$■  That  is*  we  are  concerned  with  the  transformed 
parameters  ( ^ *p  ' ) * obtained  from  the  original 
parameters  (u*(3)  by  the  linear  transformation 
|~u4-py *p  8 =B„  We  shall  henceforth  give  attention 
only  to  the  first  parameter  J * disregarding  the 
second  parameter  p c of  the  transformed  pair  (i*p8)c 
Whenever  it  should  become  necessary  to  refer  to  p?* 
however,  the  prime  will  be  dropped  for  simplicity,, 
(See  second  footnote  to  page  26 ) » 
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P we  use  a subscripts  C c This  parameter  is  called 

^ P 

a percentage  point  or  the  lOOP-percent  point  of  the 

(extreme-value  di stribution) 0 If  we  can  estimate  l 

^P 

for  different  probability  levels  such  P = o90s  ®95? 
e99>  etc0*  then  these  values  are  precisely  the  pre- 
dictions we  desire  for*  say*  gust-load  accelerations 
that  will  be  exceeded  (on  the  average)  only  10* 

1*  etCoj  respectively^  times  in  100o 

The  explicit  relationship  between^  p and  P 

can  be  determined  by  means  of  formula  (U°l)  for  the 
extreme-value  dis tribution „ If  x is  put  equal  to 
Jpj,  then  P*  the  probability  of  not  exceeding  this 

value*  is  simply  F(|p)0  Thus 

= (£  -u)/S  ~ 

P = F($p)  = exp (-e  P ) = exp ( -e  Y)*  (4„2) 

since  ^p  = u + py*  Hence*  for  a given  (usually 
large)  probability  P*  the  corresponding  J>p  is  obtained 
by  finding  y from  the  relation  (4 o2)„  and  then  writ- 
ing 


iP  = u + 


p 


(4-3) 


where  the  subscript  P has  been  added  to  y to  denote 
dependence  on.P,  Comparison  o,f  the  right  members 
of  (I4.0I)  and  (4 02)  shows  that  the  quantity  y bears 
the  following  simple  relation  to  the  corresponding 
variable  x in  ( iq 0 1 ) s 


or 

x s u + (3y  0 (h°5) 

Also*  if  in  (4.1)  we  set  u = 0,  (3  ~ ls  then  x 
has  the  same  distribution  as  given  by  the  right= 
hand  side  of  (4 „2)0  In  other  words 9 y as  defined 
by  (4.4)  or  (4.5)  has  an  extreme-value  distribution 
whose  parameters  have  the  extremely  simple  values 
u = 0?  (31  = lo  Thus  y is  called  the  reduced  variate” 
and  is  pferfectly  analogous  to  the  standardized 
variate  t = (x-  yu)/a  of  normal  distribution  theory. 
The  distribution  of  y in  (U<>2)  9 called  the  reduced 
distributions  has  been  tabulated  in  Table  2 of 
reference  1% 9 which  also  contains  a table  of  the 
inverse  function  as  well  as  a number  of  other  tables 

The  variate  x is  sometimes  referred  to  as  the 
original  or  wunreducedw  variate. 


, • :•  ''  v-  • . d. 

■ . 
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related  to  application  of  ex treme- value  theory,, 

From  the  above  discussion  it  is  evident  that 
the  solutions  of  both  the  problems  of  estimation 
and  prediction  are  embodied  in  the  one  quantity 
J,p  = u + Py^o  Estimation  of  this  quantity  will 
be  the  main  objective  of  the  remainder  of  this  report, 


4 02  Determination  of  method  of  estimation 

To  avoid  confusion,  we  distinguish  between 
a function  of  sample  variables  x^ , ? « « ->  »xn* 

such  as  the  sample  mean  g (x^ ° 0 ° *xn)  s x 

= (x,+x„+.,o.+x  )/n=  and  the  numerical  values 
1 2 n ' 9 

o o o 

gQ  85  g(XpX2S.o.‘1)xn)  assumed  by  the  function  when 

o 

the  actual  values  of  the  observations  x^  = x^  are 
substituted  into  the  function.  If  the  f unc ti on  is 
used 'to  estimate  a parameter,  we  shall  call  it  an 
estimator  of  the  parameter?  the  particular  nume ri c al 
value  assumed  in  a given  case  shall  be  called  an 
estimate  * 

In  searching  for  estimators  the  first  step 
is  to  seek  what  are  known  as  sufficient  statistics, 

A definition  of  this  concept  may  be  found  in  any 


•: ;■  - . 


' 

■ 

. 


advanced  text  on  statistical  theory,  for  example 
reference  8,Vol0  II,  p„  Qlj  but  the  feature  of  import- 
ance  here  is  that  gjiven  a set  of  joint  sufficient 
statistics,  that  is,  certain  functions  of  the  sample 
observations,  it  is  often  possible  to  deduce  frolh  thefa 
an  estimator  with  certain  desirable  properties,  pro- 
vided that  the  number  of  such  functions  does  not 
depend  upon  sample  size,,  If  it  turns  out  that  the 
only  set  of  sufficient  statistics  is  the  trivial  set 
consisting  of  the  n functions  t^ (x^ , „ . » ,x^ ) = x^, 

i=l,  ooo,  n,  i0e0  the  n sample  observations  them- 
selves, then  obviously  this  furnishes  no  guide  what- 
ever for  constructing  f line  ti  6ns  of  the  x*s  which  are 
optimum  estimators. 

Investigation  reveals  that,  unfortunately, 
jDirit  sufficient  statistics  do  not  exist  for  the  two 
parameters  of  the  extreme rvalue  distribution,  A 
probf  of  this  fact,  (which  was  conjectured  by  Bo  F„ 
Kimball,  reference  9?  P«  299)  has  been  discovered 
by  lo  Richard  Savage  of  the  Statistical  Engineering 
Laboratory  of  the  National  Bureau  of  Standards,  and 
is  presented  in  Appendix  Ao 


. 


- 
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It  may  be  note.d  that  Kimball  (i bid„ ) has  studied 

i 

a broader  concept  called  "set  of  statistical  estimation 
functions”  whereby  the  estimators  of  the  parameters 
are  given,  not  by  explicit  formulas  involving  only 
the  sample  values,  but  implicitly  as  the  solutions 
of  a set  of  simultaneous  equations,  for  example, 
the  classical  maximum-likelihood  equations 0 Unf or  - 
tunately,  such  estimators  do  not  seem  to  lend  them- 
selves  to  the  procedure  referred  to  above  for  con- 
structing optimum  estimators,  and  there  seems  to  be 
no  analytical  means  of  accurately  evaluating  the 
important  characteristics  of  bias  and  efficiency, 
defined  below,  for  such  estimators  In  the  case  of 
f ini te  samples „ (Although  these  estimators  may  be 
asymptotically  optimum,  i0e0,  for  infinitely  large 
samples,  this  need  not  be  the  case  for  samples  of 
finite  size.) 

A second  method  of  approach  to  the  problem 
of  estimation  is  the  classical  one  known  as  the 
method  of  moments 0 In  the  case  of  the  extreme- 
value  population  this  method  is  as  follows 0 

The  first  two  moments  of  the  extreme -value 


population  (4«1)  are 


1 .0-  jR|9 

■■ 

■ 


■ 


• s 
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f*x  ~ E(x)  = u + (3E(y) 


(4.6) 


= E[x-£(x)]2  = (32E[y-E(y)  ]2  = (32o2  (4.7) 

x y 


where  y has  the  reduced  ex  tr  erne  ~ value  distribution 
(4«2)p  E denotes  mathematical  expectation*  and  a 
is  the  variance*  the  second  moment  about  the  meane 
Using  the  moments  of  the  reduced  distribution:  (see, 

I *. 

e„go,  reference  8*  Vol0  I*  page  221) 


E(y) 


y 


we  obtain 


s if  - <>577216  (Euler's  constant)  (4»8) 

^2 


l c 644934 


(4-9) 


u'  + °x  =JB  P 


(4.10) 


relations  which  express  the  population  moments  in 
terms  of  the  population  parameters 0 Therefore*  if 
we  had  good  estimators  of  the  population  moments* 
we  could  readily  find  the  parameters c This  fact 
constitutes  the  essence  of  the  method  of  moments „ 
It  consists  in  treating  the  sample  as  an  adequate 
representation  of  the  population*  replacing  the 
population  moments  in  the  expressions  which  relate 


fi 
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them  to  the  parameters  by  the  corresponding  sample 
moments , e„g0£)  by  the  sample  mean  x,  and  ox  by  the 
sample  standard  deviation 


s 

X 


TT” 

2 

i-1 


(x, -x) 


(4*11) 


This  gives  x = u + ^fp,  s = (n/y^Ps  which 

X 

yield  the  moment  estimators  of  the  parameters; 
for  p#  p = j/&A)sx  j 

for  u,  u = x ^A)sx  * (4ol2) 

These  are  essentially  the  estimators  which  form 
the  basis  of  Gumbelss  method  (reference  6»  Lecture 

A A 

3s>  eq0  (3o29)p  with  - u,  l/a  “ P)w*  Thi  s method 

is  justified  by  the  fact  that  under  general  conditions 

a a , 

the  estimator  functions  u - u(x  9x  9 „ „ 0 sx  ) , 

12  n 

A A 

P = p (x^^x^ii  * o » ^xn)  in  (4»12)  approach(in  a certain 


sense)  the  values  of  the  corresponding  parameters 
u5  p as  the  sample  size  becomes  infinite 0 


The  actual  estimators  used  in  the  Gumbel  method 
are  slightly  more  complicated  (reference  6 g 
Lecture  3*  eq,  (3<>39)  s>  but  the  difference  is 
not  important  at  this  pointo  (See  Appendix  E)» 


' 
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This  method  has  apparently  given  satisfactory 
results  in  practice,  It  is,  however,  subject  to  an 
important  limitation.  In  studying  estimators  it  is 
highly  desirable  to  know  something  about  their  prob- 
ability distributions--!!  not  the  exact  density  func- 
tions,  then  at  least  their  means  and  variances.  The 
mean  value  (mathematical,  expectation)  of  an  estimator 
indicates  whether  on  the  average  the  estimates  given 
by  it  are  too  high  or  too  low  relative  to  the  actual 
values  of  the  parameter  estimated  --in  other  words, 
whether  there  is  any  bias  in  using  the  estimator. 
Similarly,  the  variance  indicates  how  much  the 
estimates  scatter  among  themselves  and  is  the  basis 
for  constructing  a measure  of  efficiency  which  makes 
it  possible  to  compare  the  performances  of  different 
estimators,  A more  useful  concept  for  some  purposes 
than  variance  is  mean  square  error  (abbreviated  MSE) 
which  measures  how  far  the  estimates  deviate,  on  the 
average,  not  from  their  own  mean,  but  from  the  quantity 
the  parame ter°=whi ch  they  are  supposed  to  measure. 

There  is  a simple  relationship  between  variance  and 


MSE,  namely, 

2 

Mean  square  error  = variance  + (bias) 


(4.13) 
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Thus*  for  unbiased  estimators * variance  and  M&E  are 
identical * and  for  brevity  we  shall  use  thb  term 

r 

’’variance”'  in  such  cases.  But  it  should  be  remembered 
that  the  concept  in  view  is  actually  the  MSE,  This 
becomes  especially  important  later  when  biased  esti- 
mators  are d iscpssed  (Appendix  E)*  and  variance  and 
MSE  are  no  longer  identic al  „ 

If  we  try  to  determine  the  mean  (or  expected) 
values  of  the  estimators*  u*  p*,  in  (1|<>12)  we  find 
that  statistically  these  functions  are  quite  complicated* 
leading  to  very  difficult  multiple  integrals  which 
apparently  can  be  evaluated  accurately  only  by  1 arge~ 
scale  numerical  integration.  This  difficulty  evidently 
persists  if  we  are  interested  in  the  parameter 
|p  = u + (3yp  instead  of  u or  p separately* 

4 ,3  Order-statistics  approach 
for  small  samples 

Apparently  the  qnly  method  of  estimation  which 
avoids  this  difficulty  is  the  method  of  order  statistics, 

* Shorter  methods  of  limited  accuracy  are  possible 
and  have  been  used  in  this  report  for  comparison 
purposes,  (See  Section  6»1  below  and  Appendix  E,) 


. 
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If  the  values  in  a sample  of  n observations  are 
arranged  in*  say,  increasing  order  of  size,  and  de- 
no  ted  by  x , x0,  . o o , x , x1  < x0  < o o o , < x .then 

L CL  “ n J 

these  values  x^  are  called  order  statistics,.  The 
smallest  is  called  the  first  order  statistic!  the 
middle  one  (if  n is  odd)*,  the  median!  the  one  which 
is  one-fourth  the  way  up  from  the  bottom,  the  first 
quartile,  e tc 0 (If  there  are  several  equal  ones  then 
suitable  modifications  are  made  in  the  defini tions ) „ 
There  is  an  extensive  literature  on  this  subject*, 
chief  among  which  is  the  comprehensive  survey  in 
reference  20o 

Order  statistics  provide  rapid  and  practical 
methods  of  analyzing  data®  The  range,  xn  ° , is 

a very  common  illustration  from  quality  control  <> 

It  is  simply  the  difference  of  two  order  statistics, 
the  largest  and  smallest,  and  its  properties  have 
been  extensively  studied  for  samples  from  the  normal 
dis tribution»  The  range  has  been  found  to  yield 
estimates  of  the  standard  deviation  of  the  population 
that  often  compare  very  favorably  with  the  theoretically 
best  obtainable o More  general  linear  functions, 

* C^Xt,  + G^X2 -+  000  + cnxns>  give  weight  td  every 
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sample  value 9 have  also  been  studied  (reference  19) s 
and  values  of  the  coefficients  have  been  found  which 
make  it  possible  to  estimate  very  simply  and  remark- 
ably well  certain  quantities  which  previously  were 
obtained  only  by  more  complicated  c alculations „ 

We  shall  carry  over  and  extend  this  procedure 
to  the  case  of  samples  from  the  extreme -value  distri- 
bution (4.1) . The  method  will  in  many  respects  follow 
the  general  approach  used  in  reference  19  for  several 
other  distributions „ The  aim  is  to  determine  the 
weights  i-1 , „ « „ s n5  for  all  the  n order  statistics 

in  a sample  of  size  n so  that  the  linear  estimator 

n 

L = S w„x4  (4®14) 

i=l  1 1 

has  the  properties  we  desire 9 namely s 

(l)  The  mathematical  expectation  equals  the 

parameter  to  be  estimated s i,e.  the  estlma- 
tor  is  unbiased g1 

E(L)  - | (4.15) 

P 

The  MSEs  which  in  this  case  is  the  same 
as  the  variance^  is  as  small  as  possible^, 
consistent  with  ( 1 ) s 


(2) 
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MSE(L)  = o2(L)  = E[L=E(L)]2 

= a minimum#  (4ol6) 

An  estimator  L which  satisfies  these  two 
conditions  will  be  denoted  by^ps  a notation  suggested 
by  condition  (1 ) „ 

Condition  (2)  is  equivalent  to  saying  that 
the  estimator  £ is  as  efficient  as  possible  under 
the  given  conditions 0 This  concept  will  be  discussed 
below# 

The  mathematical  formulation  of  this  minimum^ 
variance  problem  is  developed  in  Appendix  Bp  and  the 
solutions  (the  weights)  are  shown  in  Table  I for 
n - 2 to  n - 6#  The  case  n greater'  than  6 is  dis= 
cussed  in  the  next  Section,,  For  each  given  value 
of  n9  n weights  wj_ 5 w2 s °°»s  wn  are  determined  that 
depend  on  the  quantity  y^  that  occurs  in  the  parameter 
£p  - u + Pyp  to  be  estimated#  The  w^  are  each  of 

' i 

the  following  forms 

^i  ” b .^y p p i s lj  2S  o o o p n o ( Lf. o 1 y ) 

Substituting  these  weights  for  given  n into  (lf.ol6) 
actually  gives  the  minimum  value s that  the  variance 

can  attain  under  the  above  condi tions*  and  this  value 
•depends  upon  quadra  tic  ally  § 
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V , - Q 

min  n 


= <AnvBnyp+cn)p2 


(4.18) 


Table  I gives  the  values  a^9  b^„  A^,  B „ C , which 

have  been  found  by  exact  computation  methods  as 

indicated  in  Appendix  B and  Table  II  there  described,. 

The  quantities  V . - Q are  shown  in  Table  III. 

min  n 

Part  A. 


As  sample  size  increases , the  estimation  is 
expected  to  improve  and  the  variance  to  diminish,. 

In  order  to  have  a convenient  standard  of  compari- 

t ft 

son,  in  the  case  of  unbiased  estimators,  we  scale 
all  variances  by  dividing  into  a theoretically  specified 
variance  ( Q_  , known  as  the  ”Cramer~Rao  lower  bound” 

(reference  1,  pp®  4^0?  equation  (32*3  <>3a)  which 


Tc  For  biased  estimators,  see  Appendix  E® 

■JHS’  This  Cramer-Rao  bound  is  given  for  the  case  where 
the  distribution  has  only  one  parameter  to  be 
estimated®  For  the  extreme -value  distribution  with 
the  two  parameters  (J, p)  we  can  regard  6 as  a 
^nuisance  parameter”  and  thus  obtain  a "Cramer-Rao 
bound”  for  § , the  expression  for  which  will 
involve  (3  (see  first  footnote  to  Table  III)®  This 
procedure  is  based  on  the  ^method  of  nuisance 
parameters”  discussed  in  reference  11®  (See  also 
footnote  to  page  13®) 


2? 


is  less  than  or  at  most  equal  to  the  variance  of 
any  (unbiased)  estimator  of  the  parameter  in  ques- 
tion"' o The  result  is  then  an  absolute  number 

l ....  - 

between  0 and  1 which,  when  expressed  as  a percent- 

t 

age 5 is  called  the  efficiency  of  the  estimator  for 

samples  of  ns 

Efficiency  (L)  = En(L)  s ® (4»19) 


The  quantities  E , whiclp.  evidently  depend  upon  y-., 

n e 

and  therefore  upon  P,  are  given  for  n “ 2 to  6 for 
selected  values  of  the  probability  P in  Table  III, 

Part  Be  Part  A contains  the  numerical  values  of 

I 

the  variances 

2 

of  the  parameter  (3  o The  expression  for  has 
been  implicitly  given  in  reference  10,  p„  113,  and 
is  indicated  in  the  first  footnote  to  Table  III#  Part  A 
of  this  repprto 


and  the  lower  bound 


in  terms 


There  may  or  may  not  exist  estimators  whose 
variances  reach  the  lower  limit  QtRo  If  (as 
may  happen)  there  exists  a Q,9  > Q.H;  such  that  the 
variance  of  every  estimator  is  > tp  (and  of  course 
» Q_  )„  then  Q9  may  be  substituted  for  in 

the  numerator  of  the  above  expression  for  efficiency 
(4ol9)  without  the  fraction  exceeding  1„  The 
investigation  of  the  existence  of  Q,9  is  too  com- 
plex a matter  for  purposes  of  this  reports  How- 
ever s the  only  effect  of  using  a lower  bound 
which  is  too  low  is  to  understate  the  efficiency, 
so  that  the  results  are  on  the  safe,  conservative 
side*, 
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Two  points  should  be  noted  about  the  choice 

of  probability  levels  shown  in  Table  III0  The  value 

P = *,36788  ~ l/e,  which  corresponds  to  = 0,  is 

important  because  it  gives  the  mode,  one  of  the 

desired  parameters  of  the  distribution*,  This  is 

evident  from  the  fact  that  the  parameter  we  are 

estimating  is  = u + py^  ” u,  the  mode*  for  y.p  - 0. 

Similarly,  the  limiting  value  P = 1 corresponds  to 

the  scale  parameter  (3*,  This  may  be  seen  as  follows. 

If  P approaches  1,  the  £ ^ and  y^  both  become  inde- 

finitely  large,  but  their  ratio  £ - g /y  = (u/y  )+p 

, P P P P 

may  be  considered  to  be  a new  parameter  which  approaches 
(3,  since  the  mode  u Remains  fixed  and  finite  (as  does 
also  (3 ) o Hence  we  may  estimate  (3  by  first  estimating 
gp  for  arbitrary  P and  then  letting  P approach  1. 

Now  from  (4»l4)  and  (4»17)j>  the  linear  estimator 

A 

L = £p  is  of  the  form 

ip  - f*i  + ypf 2 (4.20) 

where  f and  f are  functions  of  the  sample  values 
which  do  not  involve  yp.  By  the  preceding  remark, 
we  can  then  estimate  the  parameter  (3  by  writing  down 


. 
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the  corresponding  estimator  of  gi  , 


%'  _ i_ 
" >? 


f I 

— + f0 
yP  2 


and  letting  P approach  19  obtaining 

a 

g = t0 

P 2 

as  the  corresponding  estimator  of  (3# 

A 

In  other  words *,  an  estimator  g of  (3  may  be 

p 

obtained  by  simply  taking  the  coefficient  of  yp  in 
gp  when  written  in  the  form  (4*20) 0 Similarly*,  the 

a 2 

variance  of  is  the  coefficient  of  yp  in  the  variance 

of  gp„  This  may  readily  be  seen  as  follows#  From 
(4*20)*,  we  have 


°2(£j  = o2(f1)  + 2yp  cov (f  s f ^ ) 


1 

2 2 
+ y O 

P 


2 


2 

A + Byp  + Cyp 


where  A*,  B*,  C are  quantities  which  do  not  involve 
yp  (though  they  may  involve  (3*,  in  general)^  thus*,  as 
P approaches  1 and  yp  increases  without  limit*, 


? / A 
oil 


1 2 * 
2 ° 

yp 


A 


+ C C 


. 
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2 2 a 

the  coefficient  of  y in  a (g  ) o From  this  it  follows 

/\ 

also  that  the  efficiency  of  the  estimator  being 

a ratio  of  variances,  is  simply  the  ratio  of  the 

2 

coefficients  of  y^,  the  other  terms  being  disregarded. 

A 

These  facts  applied  to  the  estimator  | make 

P 

it  possible  to  avoid  a separate  treatment  for  the 
two  parameters  u and  (3  0 Their  estimator  are  each 
represented  by  a single  line  in  a table  (such  as 
Table  III)  showing  values  for  various  probability 
levels : 

P = o36788  (or  yp  = 0)  gives  u|  P = 1 (or  yp  = 00  ) 
gives  (3  » 


The  concepts  of  variance  and  efficiency  have 

also  a more  concrete,  practical  significance 0 The 

lower  bound  to  the  variance  has  the  form 

Q_  ~ Q /ns  where  Q,  is  a quadratic  function  of  y , 
ijB  o o P 

but  is  independent  of  sample  size  n„  For  two  samples 

of  sizes  n'and  nn,  the  variances  Q!  « Q*  are  in 

LB  LB 


Q1 


LB 


LB 


the  ratio 
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i.e,,  inversely  proportional  to  sample  size„  Similarly, 
if  we  had  two  estimators  for  the  same  sample  size, 
we  could  form  the  ratio  of  their  variances  and  think 
of  it  as  representing  an  (inverse)  ratio  of  (hypo- 
the ti cal)  sample  sizes » Thus,  if  for  a sample  of 
20,  the  variance  Q»  of  one  estimator  were  one-half 
the  variance  Q,”'  of  an  alternative  estimator,  then 
the  first  estimator  would  require  a sample  of  only 
10  to  give  as  much  information  as  could  be  obtained 
with  the  second  from  a sample  of  20 o This  saving 
of  half  the  number  of  observations  is  expressed  by 
saying  that  the  first  estimator  is  twice  as  efficient 
as  the  second o In  general,  a saving  of  the  fraction 
p of  the  observations  makes  one  estimator  l/(l-p) 
times  as  efficient  as  a second, 

A- 

The  efficiencies  of  the  estimators  gpin  Table 
III  are  more  conveniently  compared  in  graphical  form, 
as  in  Figure  3,  The  heavy  horizontal  line  at  the 
top  indicates  perfect  or  100  percent  efficiency,  and 
the  rising  curves  as  n increases  show  how  closely  the 
estimator*  is  approaching  the  standard  of  perfection,. 

The  most  outstanding  fact  is  that,  in  marked  contrast 
*to  a theoretical,  perfect  estimator,  the  efficiency  of 
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A 

the  actual  estimator:  depends  upon  the  probability 

P,  being  largest  for  the  middle  ranges  .i;0  to  .60 
and  dropping  considerably  at  the  ends  near  0 and  1, 
Since  analysis  df  extreme  (largest)  data  is  concerned 
chiefly  with  the  larger  magnitudes  associated  with 
very  small  probabilities  of  occurring  or  being  exceeded 
we  shall  limit  our  interest  to  the  range  above  P = .90. 
For  n = 6 the  efficiency  exceeds  the  80  percent  level 
for  all  values  of  P in  this  range  that  are  apt  t© 
occur  in  paractice  (i.e.,  P < o 999 ) ® In  view  of  the 
satisfactory  values  of  efficiency*  further  calculation 
for  n > 6 did  not  appear  warranted  at  this  time, 
particularly  since  it  became  apparent  that  the  labor 
of  computation  would  increase  out  of  all  proportion 
to  the  rapidly  diminishing  improvement  in  efficiency. 

Of  course,  most  samples  of  observations  are 
larger  than  the  trivial  size  of  6*  and  the  question 
arises  of  how  to  handle  the  larger  samples.  This 
is  treated  in  the  next  subjection. 


- 
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Extension  to  larger  samples 

The  key  to  handling  samples  with  more  than 
six  observations  is  to  treat  them  as  sets  or  sub- 
groups of  samples  of  6 (or  if  necessary,  5)°  If  a 
sample  size  is  not  an  exact  multiple  of  6 or  of  5? 
then  the  sample  may  be  treated  as  consisting 
either  of  subgroups  of  6 with  an  odd  group  remaining 
having  less  than  6 items,*  or  of  subgroups  of  5 with 
a remaining  group  of  6«  We  first  deal  with  the  simpler 
case  where  n is  an  exact  multiple  of  5 or  6, 

Case  I o Sample  size  an  exact  multiple  of  5 or  6, 
Suppose*  in  general*  n = km,  where  m is  the  size  of 
subgroup,  which  need  not  be  6,  and  k is  the  number 
of  subgroups  in  the  sample » If  the  sample  Is  so 
divided  into  subgroups  that  the  observations  in  one 
subgroup  may  be  considered  to  be  statistically  inde- 
pendent of  those  in  any  other  subgroup,  then  it  is 
legitimate  to  treat  the  sample  as  consisting  of  k 
independent  subsamples,  each  of  size  m0 

One  way  of  obtaining  independent  groups  is 
by  use  of  random  numbers 0 This,  however,  will  lose 
valuable  information  embodied  in  the  order  in  which 
’the  data  were  actually  observed 0 If  the  data  are 


' 


truly  random,  so  that,  for  example,  there  are  no 
seasonal  effects,  then  this  implies  that  subgroups 
formed  in  the  order  in  which  the  data  are  observed— 
the  first  m values  observed  put  into  the  first  group, 
the  next  m into  the  second,  etc®— should  be  independent® 
This  assumption,  of  course,  underlies  the  entire  method 
of  estimation  described  in  this  report,  and  we  shall 
adopt  it  in  our  procedures® 

From  each  subgroup  we  form  the  "sub-es timator,s 
m 

T = 2 wx,  , i = 1,  2,  »0®,  k , (4.21 ) 

i j=l  J J 

where  the  weights  w-^,  W2,  ®®®,  are  those  taken 

1 

from  Table  I for  sample  size  m and  are  the  same  for 
each  subgroup  of  m values  (but,  of  course,  are  different 
for  different  sizes  m)  ® Thpse  k sub ~ estimators  T^ 
are  then  combined  by  simple  averaging  to  form  the 
grand  sample  estimator? 

1 k 

T = ■=■  2 T.  , (4*22) 

k i=i  1 


The  variance  of  this  estimator  is  simply 


' 
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since  we  are  taking  the  variance  of  a mean  of  k 
independent  quantities  T^,  each  of  which  has  the  same 
variance'*  Var(T^)  = Q,^|  denotes  the  variance  tab- 
ulated  in  Table  III,  Part  A,  for  m = 2,  3,  4*  5*  6° 
The  efficiency  of  T is,  since  n = km. 


%,B  E5  Qo  1 Qo 

E = mmri  = ”T“ — = = E 

Var(T)  Qm  m 


(4°24) 


Where  Q^g  = QQ/n  - QQ/km,  and  Qq  is  independent  of  n, 
Thus  we  have  the  important  fact  that  if  a sample  is 
broken  into  equal-size  subgroups,  the  efficiency  of 
the  order-statistics  estimator  depends  only  upon  the 
size  (m)  of  the  subgroup  (and,  of  course,  P)» 

Since,  according  to  Table  III,  Part  B,  efficiency 
increases  with  sample  (or  subgroup)  size,  it  follows 
that  when  there  is  a choice,  a sample  should  be  broken 
into  subgroups  as  large  as  possible  for  best  efficiency, 
i8e,  into  subgroups  of  6„  If  this  is  not  possible, 
but  if  the  sample  size  n is  an  exact  multiple  of  5* 
then  subgroups  of  5 may  be  used  with  not  much  loss  in 

x These  variances  are  equal  because  they  depend  only 
upon  m,  P,  and  (3,  which  are  constant  for  all  the 
subgroups  of  the  same  sample. 


. 


. 


36 


efficiency.  The  last  two  columns  in  Table  III, 

Part  B,  show  that  the  loss  is  20i| . percent 

(=  o8647  ~ 08404)  at  P = c95j>  rises  to  a maximum 

of  3 08  percent  for  the  limiting  value  P = 1 » 

Case  II « 1 Sample  size  not  an  exact  multiple  of  $ or  6 „ 
In  most  cases,  of  course,  the  sample  size  will  have 
a remainder  when  divided  by  both  £ ar>d  6 0 There  are 
then  a great  variety  of  choices  as  to  how  to  partition 
n into  subgroups  of  6 and  5>  and  perhaps  other  sizes. 

Many 'of  these  possibilities  have  been  examined,  the 
aim  being  to  establish  as  simple  rules  as  possible 
without  too  great  a,  loss  in  efficiency.  Fortunately, 
most  of  the  methods  of  partitioning  a sample  of  given 
size  n do  not  lead  to  greatly  different  efficiencies. 

Thus  the  following  rules  can  be  laid  down  for  n > 7 

1 

(n  <;  6 does  not  involve  breaking  into  subgroups)  based 
on  writing  n in  the  form  either  6k  + m®  or  5>P  + 6, 
where  m*  < Gt 

(a)  n = 7 up  to  large  values, 

(!)  Use  n = 6k  + m8  s split  up  into  k 
subgroups  of  6 and  a remainder  sub° 
group  of  m1  <6  items,  unless  n = 31#  61, 


etc,,  i,e,,  a multiple  of  30  plus  1 
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(ii)  If  n = 3Qk  + 1,  write  it  as 

n = (30k~5 ) + 6 = (6k-l)  x 5 

+ 6 , i.e,  split  sample  into 
(6k-l)  subgroups  of  5 ana  a 
’’remainder”  subgroup  of  60 

(b)  n extremely  large 0 If  sample  size  Is 

j 

of  the  order  of  several  hundred  or  more, 
so  that  the  number  of  subgroups  is  of  the 
order  of  £0  on  100 , then  the  amount  of 
computation  becomes  increasingly  lab~ 
orious o For  such  very  large  samples  of 
extremes,  which  are  rather  rare,  a short=> 
cut  method  is  available  which  is  explained 
in  Appendix  Ce  While  its  efficiency  is 

. i 

substantially  less  then  the  longer  method 
presented  here,  it  is  nevertheless  of 
practical  value  inasmuch  as  the  loss 
in  efficiency,  which  in  practical  terms 
means  an  effective  loss  in  number  of 
observations,  is  not  very  important  when 
a very  extensive  amount  of  data  happens 


to  be  available 
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The  variance  and  efficiency  of  an  estimator 
for  most  sample  sizes.  (Rule  (a))  can  be  discussed 
readily  in  general  terms » Assume  that  n = km  + m’ 
represents  the  separation  of  the  sample  into  two 
parts s one  consisting  of  k equal  subgroups  of  size 
m = % or  6;  the  other  consisting  of  the  remainder 
subgroup  of  size  mf  < m except  for  the  exceptional 
case  where  m = 5,  m'  =6  (Case  II,  rule  (a) (ii) ) • 

The  average,  T,  is  formed  from  the  first  part  as 
described  under  Case  I,  Then  a sub=es timator  T'  is 
formed  from  the  remainder  subgroup  of  m1  values  using 

i 

the  weights  w^  for  samples  of  size  m' s 

m*  i < 

T*  = 2 w‘x.  , . (4o2£) 

i=l  1 1 

j 

where  x. , i = 1,  2,  m! , denotes  the  m*  values 

i * 

in  the  subgroup e Finally  a weighted  average  of  T 
and  T!  is  formed,  and  this  is  the  grand  sample 

A 

estimator 


f = tT  + t'T» 


v 


(4»26) 
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where 


the  multipliers 


•St- 

are 


(4*27) 


Since  all  the  subgroups  are  independent,  sc  are 
T and  T??  whence 


Var  (f . 


2 

t>  Q. 


m 


1 

since  the  variance  of  the  mean  T is  » 0 9 

k m 

Prom  the  above  it  is  evident  that  once  the 
partitioning  of  sample  size  n into  n = km  + m'  is 
determined,  the  variance  and  efficiency  may  be 
obtained  except  (in  the  case  of  the  variance)  for  a 
factor  (3^  which  must  be  estimated  from  the  data. 
Table  IV  lists  for  convenience  the  efficiencies  at 
two  probability  levels,  P - *99  and  the  limiting 
value  P = 1,  for  most  of  the  sample  sizes  that 


■Sc  Other  multipliers  are  possible-.  In  particular, 

there  is  an  optimum  set  of  multipliers  which  nroduces 
an  unbiased  estimator fp  with  slightly  smaller  variance, 
and  hence  slightly  greater  efficiency.  The  optimum 
multipliers  are,  however,  less  simple  than  the  pro=- 
portions!  ones-  for  example,  they  are  not  constants 
but  depend  on  P=—  and  the  gain  in  efficiency  is  not 
great®  This  was  shown  by  a number  of  trials  and  by 
the  fact  that  In  any  event,  the  efficiency  cannot, 
exceed  that  for  the  larger  subgroup  size,  E (or  En, 
|fm!>m),  and  does  not  differ  much  from  It  if  the 
total  sample  size  n is  at  all  sizable  , say  > 20® 


may  occur  in  practice  t^i  th  gust-load  data,  provided 
the  sample  is  split  up  according  to  the  above  rules ® 
The  levels  P = ®99  and  P = 1 furnish  a convenient 
basis  for  comparing  the  efficiencies  of  two  differ- 
ent partitions  of  the  sample  size®  At  this  end  of 
the  probability  scale  the  difference  between  the  two 
efficiencies  decreases  monotonically  as  P decreases® 
Thus,,  if  the  difference  in  efficiencies  is  3 percent  at 
P - ,99  and  4 percent  at  P = 1 , then  the  difference 
is  between  3 and  4 percent  at  P - ®995#  say,  and  at 
P = ®95>  and  under  is  apt  to  be  substantially  below 
3 percent,  a difference  negligible  for  practical 
purposes®  The  partitions  shown  in  Table  IV  are  those 
recommended,  by  rule  (a)  above®  In  certain  eases,  the 
efficiencies  of  alternative  partitions  are  shown  in 
the  footnotes  to  Table  IV  for  use  in  case  the  extra 
few  percentage  points  in  efficiency  are  considered  to 
be  worth  a little  loss  of  simplicity  in  computation® 
There  are  some  useful  a priori  guides  for 
judging  the  efficiency  in  any  given  case  .even  beyond 
the  limit  n = 4^  of  Table  IV®  Thus,  if  n = km  + m5, 
it  is  clear  that  the  efficiency  cannot  exceed  that 
for  the  subgroup  sizes  m and  ms  , but  must  lie 


kl 


somewhere  between  the  efficiencies  corresponding  to 
these  two  sample  sizes®  If  m and  m9  are  not  far 
apart*  then*  regardless  of  the  number  of  subgroups 
k,  the  efficiency  Is  determined  between  narrow  limits® 
Again,  if  k is  substantial,  say  near  10  or  more,  then 
the  efficiency  is  practically  that  for  the  larger 
sstfnple  size  m®  Of  course  the  maximum  efficiency 
obtainable  by  the  procedure  outlined  here  is  for 
Case  I when  the  sample  size  is  an  exact  multiple  of 
6«  For  P = *99  the  efficiency  in  such  case  is  83*2, 
and  for  P = 1,  it  is  76*8*  If  any  given  partition 
results  In  efficiencies  within,  say,  2 or  3 percent 
of  these  values,  then  there  Is  nothing  significant 
to  be  gained  by  using  any  other  partition,  unless 
it  is  such  as  to  simplify  the  computation* 
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SUMMARY  OF  PROCEDURES 

The  above  method  of  analysis  will  now  be 
summarized  for  ease  of  reference®  The  use  of  the 
method  has  been  considerably  simplified  by  the  con- 
struction of  speci  ally  designed  worksheets,  represented 
by  the  pair  of  blank  specimen  forms  following  this 
page,  A completely  filled-out  pair  (Worksheets  1 
and  2)  will  be  found  immediately  preceding  the  tables 
at  the  end  of  this  report®  With  the  aid  of  such  work- 
sheets about  two  hours  should  be  sufficient  for  all 
the  calculations  for  a moderate-size  sample,  such 
as  the  sample  of  23  observations  analyzed  below,  and 
It  has  been  found  that  this  period  is  even  sufficient 
to  include  the  graphical  analysis  also  presented® 

The  materials  needed  for  application  of  the 
method,  besides  Worksheets  1 and  2 and  a sheet  of 
extreme  probability  paper,  are,  in  the  order  in  which 
needed  2 

(i)  Table  IV,  showing  efficiencies  for 
various  methods  of  splitting  sample 
into  subgroups „ 

(ii)  Table  I,  giving  the  weights  a^,  bi, 

(iii)  Table  III,  furnishing  the  quantities 
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ESTIMATION  OF  EXTREMES 
Worksheet  1 - Determination  of  Estimators 


Sources 


Computers 

Dates 


Record 

No. 


Observed 

extremes 


I.  SUBGROUP  SIZES  AND  PROPORTIONALITY  FACTORS s 
n = = k m + m 1 


= x + 
m = 


t = km/n  = 

t2/k  = 0;_ 


t1  = m'/n  = 0. 
t'2  = 0. 


IIA.  MAIN  SUBGROUPS s 

Weights  a^,  (from  Table  I) 
i = 1 2 3 k 


6 Check  sum 


b.  = 


Subgroup  X]_ 
No. 

1 

2 

3 

is 

5 

k 

SUJTl 


Observations  x^  in  increasing  order  from  i = 1 to  i - m 

x2  Xj  x^  xg  xg  Check  sum  Sa-x.^  Zb.jX.j_ 


T = Sa^x^/k  + (Zb-XjAJyp  = 


ITB.  REMAINDER  SUBGROUPS 

Weights  a!^  b^  (from  Table  I) 

1 2 3 4 


1 = 

a.  = 
i 

b.  = 


6 Check  sum 


Observations  x.*  in  increasing  order  from  i = 1 to  i=  m» 


*L 


x'  Check  sum  Za.  x.  Zb.x. 

6 a i x x 


T'  = Za^xx  + (Sbixi)yp  = + 


III.  ESTIMATORS 5 


5 = t T + t'T* 


/ l 


P = 


estimation  of  extremes 
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Worksheet  2 - Predicted  values,  confidence  band,  efficiency,  plotting  positions 


p yp 

PREDICTED  VALUES 

£p=  + y? 

= Q 

m 

Q_,=  Q 

m1 

Tar(lp) 

t2  2 

682-CONFIDENCE 
BAND  HALF-WIDTH 

QIS  = Qo/n 
(Q0  from 
Table  III) 

EFFICIENCY 
E-  QLB 

(from  Tat 

le  ni) 

o(Ep)  = </var(£p) 

yar(Cp) 

(D  (2) 

(3) 

a) 

(5) 

(6) 

(7) 

(8) 

(9) 

t2/k  = 

2 

t»  = 

P = 

Estimate  of  u: 

.36788  0 

p2 

p2 

P2 

P2 

.50  0.36651 

P2 

p2 

P2 

P2 

.90  2.25037 

P? 

p2 

P2 

P2 

.95  2.97020 

P2 

p2 

P2 

P2 

.99  it. 60016 

P2 

p2 

P2 

P2 

.999  6.90726 

P2 

p2 

P2 

P2 

1 

- - - 

ypP2 

yp2? 

y2P2 

... 

yp2P2 

PLOTTING  POSITIONS 

Observed  extremes  in  increasing  rank  from  1 to  n = 


Rank 

Observed 

Plotting  Position 

Rank 

Observed 

Plotting  Position 

Rank 

Observed 

Plotting  Position 

r 

Extreme 

r/(ntl) 

r 

Extreme 

r/(ntl) 

r 

Extreme 

r/(ntl) 

1 

n 

21 

2 

12 

22 

3 

13 

23 

it 

lit 

2it 

5 

15 

25 

6 

16 

• 

7 

17 

• 

8 

18 

• 

9 

10 

19 

20 

n 

Sum 
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The  assumptions  upon  which  the  method  is  based 
are  that  the  data  in  the  given  sample  (arranged  in 
the  order  in  which  observed''*)  may  be  treated  as 
independerit  random  observations  all  from  the  same 
population 

F(x)  = exp(-e-(x"u)/P) 

(In  cummulative  form),  with  constant  unknown  parameters 
u,  (3  to  be  estimated. 

For  concreteness,  the  rules  below  refer  to 
an  actual  example,  worked  out  in  Worksheets  1,  2, 
and  Figure  consisting  of  the  23  maximum  positive 
acceleration  increments  observed  in  23  flights  of  an 
airplane  and  identified  as  "NACA-Langley-Sample  III," 
which  are  listed  in  the  column  headed  "Observed  extremes 
(+An)M  in  Worksheet  1.  These  data  are  assumed  to  be 
given  in  the  order  of  observation,  so  that  under  the 
above  assumptions  this  arrangement  may  be  considered 
to  be  a random  one , 

Vf  If  the  observations  are  not  available  in  their 
original  order,  it  will  first  be  necessary  to 
randomize  them  by  use  of  a table  of  random  numbers . 
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Each  rule  (except  Nos«  2 and  7,  which  are 
subdivided)  consists  of  a single  paragraph  and  this 
is  followed  by  a detailed  explanation  of  its  use, 
inserted  for  conveniencehfof  the  user.  This  makes 

* v 

the  list  unavoidably  lengthy,  but  the  rules  them- 
selves are  brief  and  simple  to  apply* 

Before  starting  the  calcul ations , it  is 
desirable  to  plot  the  data  on  special  probability 
paper  according  to  the  directions  in  rule  6 (a), 
’’Graphical  anal ysi s , rt  below,  in  order  to  obtain 
a crude  judgement  of  how  well  the  data  fit  the 
assumed  distribution.'  In  rearranging  the  data  in 
order  of  size,  however,  care  should  b'e  taken  not  to 
lose  the  record  of  the  original  order  in  which  the 
data  were  taken;  because  randomness  will  then  have 
to  be  reintroduced. 

Determination  of  estimators  — Worksheet  I. 

1.  Enter  the  observations  in  the  second 

column  in  the>’  order  in  which  given.  The 
first  Column  is  for  identification  pur- 


poses 


' •••  b.H" 
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2,  Determine  partition  of  sample  size  (if  7 
or  more,  but  not  extremely  large)  and 
split  sample  into  subgroups  according  to 
the  following  rules  (a),  (b)  or  (c)»  If 
n is  extremely  large,  say  several  hundred 
or  more,  see  Appendix  C» 

(a)  If  n is  an  exact  multiple  of  *>  or  6, 
write  n = k *5  or  n = k*6;  if  both,  use 
n = k*6. 

(b)  If  n is  not  an  exact  multiple  of  5 or 
6,  write  n = k*6  + m* , m’  <6,  uni ess 
n = 31 > 61 , etc.,  i.e.,  one  plus  a 
multiple  of  30* 

(c)  If  n Is  of  the  form  30k  + 1,  write  it 
as  n ~ (30k-£)  + 6 = (6k-l)5  + 6,  i.e. 
split  n up  into  (6k-l)  subgroups  of  5 

it  1 H . 

and  a remainder"  subgroup  of  6* 

Once  k,  m,  m!  are  determined  the 
blanks  in  Section  I can  be  filled  in. 
At  the  same  time,  in  Worksheet  2,  the 
numerical  values  of  m,  m’  should  be 
on  bored'  as  subscripts  in  the  headings 
WQ  u and  "Q,  H for  columns  k and 
respectively.  In  the  worked  example, 
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n=23=3*6+5  (rule  (b)),  so  the 
data  are  split  into  3 main  subgroups  of 
6 and  a remainder  subgroup  of  5* 

Number  of  decimal  places.  - As  a result  of 
considerable  experimentation  it  is  recommended' 
that  all  computations  be  carried  to  exactly 

I 

the  number  of  places  shown  for  each  item  on 
the  two  worksheets* 

3*  Find  estimators  for  the  parameters  £ , u, 
ty  .filling'  in  the  blanks  and  following  the 
directions  indicated  in  Worksheet  1,  Sections 
II A,  II B,  III. 

In  Section  IIA,  obtain  the  weights 
a^ 9 b^  from  Table  I for  n = m,  the  size 
of  the  main  subgroups*  Mark  off  the  sub- 
groups  by  any  convenient  means,"  arrange  the 
observations  in  increasing  order  within 
each  subgroup  and  enter  horizontaly  opposite 

* It  was  found  convenient  here  to  determine  the  sub- 
group size  m before  entering  the  data  in  the  extreme 
left  columns,  so  that  the  subgroups  could  be 
plainly  indicated  by  means  of  a space  after  every 
m-th  observation. 
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the  proper  subgroup  number  in  Section 
IIA,  Obtain  the  two  product  sums 
m m 

2 a.x.,  2 b.x.  as  indicated  in 

1=1  11  A=1 

two  right-hand  columns  and  sum  all 
columns  as  shown*  The  two  product- 
sums  evaluated  for  the  line  labelled 
’"Sum'*  will  serve  as  a check,  Form 
the  average.  T,  by  dividing  by  the 
number,  k,  of  main  subgroups* 

The  work  in  Section  I IB  is 
analogous,  except  that  the  weights 

i t 

, b^  are  the  and  b^  shown  in 
Table  I for  n = m* , the  size  of  the 
remainder  subgroup;  also,  since  there 
is  only  one  subgroup,  averaging  is 
unnecessary. 

Section  ITT  combines  the  (sut)esti- 
mators  T and  11  w'  th  the  proportionality 
coefficients  t,  ts,  determined  in 
Section  I,  to  produce  the  final  over- 
all sample  estimator 

= tT  + t 1 T?  = *929^6  + *16774  yp, 

on  collecting  the  coefficients  of  y^ 
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and  the  constant  terms.  The  estimates 
of  the  parameters  u,  p are  read  off 

A 

at  once  from  the  coefficients  of  £ 
and  entered.  This  constitutes  the  fitting 
of  an  extreme-value  distribution  to  the 
given  data. 

Predicted 

values  (etc.)  - Worksheet  2« 

4. 

A 

Compute  the  values  of  in  column  3 for 

the  values  of  P apd  y^  shown  in  columns 
1,  2,  These  values  constitute  the  set 
of  predictions  for  the  respective  prob- 
ability  levels. 

Additional  probability  levels  may 
b8  inserted  between  those  shown,  if  desired* 
The  value  of  y = -log  (-log  P)  is  found 

Jr  00 

most  conveniently  from  Table  2 of  reference 

15. 

5. 

Confidence-band  half-width  (68-percent 
control  curves)  are  computed  from  the  stan- 

dard  deviations  as  indicated 
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The  numerical  values  of  the  variances 

v in  columns  and  5 are  found  under 

these  same  headings  in  Table  III,  Part 

A,  and  entered  as  shown*  The  values  of 
2 2 

t /k,  t*  are  entered  above  these  values, 
as  indicated,  in  order  to  facilitate  com- 
putation of  the  variances  of  the  overall 
es  timator, 

V"(h>  =TQ™+  t,2qm  ' 

in  column  6*  Column  7 gives  the  standard 

A 

deviation  of  the  estimator  £ » I t is  most 

P 

easily  computed  by  taking  the  square  root 

2 

of  the  coefficient  of  (3  in  column  6 and 

by 

mul ti plying Athe  value  (3  found  in  Section 

A 

III  of  Worksheet  1*  Thus  a(£p)  for 
P = *50  i s /*0605T  times  the  value 
(3  = »1677U  (written  at  the  top  of  column 
7 for  convenience),  giving  the  value  »0ij.l3 
shown* 

The  standard  deviation  of  the  estimator 


measures  the  reliabili ty,  that  is  the  extent 
to  which  repeated  application  of  the  procedure 


MV 


;A 
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to  repeated  samples  taken  under  the  same 
conditions  would  give  values  clustering 
more  or  less  closely  about  the  unknown 
parameter  value®  For  example,  for  a fixed 
probability  P#  in  about  68  percent  of  the 
time  (when  the  assumptions  are  satisfied) 
the  ' computed  interval  £ plus  or  minus  one 
standard  deviation  will  contain  the  true 
unknown  parameter^  ^ = u + pjp®  For  two 
standard  deviations  the  percentage  rises  to 
95"  ♦ '^wo  curved  lines,  one  joining  the  left- 
hand  endpoints  of  these  intervals  and  one 
joining  the  right-hand  endpoints,  are  called 
control  curves  (see  rule  7 for  graphical 
analysis,  below)  and  these  two  curves  define 
a confidence  band  consisting  of  the  area  be- 
tween ' them.  The  interval  of  values  of  the 
abscissa  x = ^ included  between  the 

These  percentages  are  only  approximate  since  they 
assume  £p  to  be  normally  distributed.  As  indicated 
in  Appendix  D,  this  assumption  is  sufficiently 
correct  for  practical  purposes  for  samples  of  the 
order  of  100  or  more.  This  may,  of  course,  not  be 
the  case  for  much  smaller  samples*  However,  normality 
assumptions  of  this  kind  must  often  be  made  in 
practice  in  the  absence  of  large-scale  investigations 
"to  establish  more  precise  distributions®  Results 
obtained  in  this  manner  have  often  been  found  to  be 
satisfac  tory« 


. 


..  ' • , 
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control  curves,  when  P is  given  a specific 
value,  is  called  a confidence  interval „ 

The  standard  deviation  in  Column  7 is 
thus  the  half-width  of  a 68-percent  con- 
fidence band  (or  interval)*  If  levels  of 
95  percent,  etc*,  ar-e  desired,  the  values 
can  be  readily  obtained  by  adding  another 
column  consisting  of  twice  (etc*)  the  entries 
in  column  7* 

6>  Efficiency  is  computed  as  follows*  The 

values  of  Q for  the  indicated  values  of 
o 

P are  taken  from  the  column  headed  in 

Table  III,  Part  A,  divided  by  the  given 

sample  size  n,  and  entered  in  the  column, 

8,  of  Worksheet  2*  The  efficiency  is 

obtained  by  dividing  this  by  the  correspond - 

2 

ing  entry  in  column  7?  canceling  the  (3 
(which  was  one  reason  for  carrying  it  along 
separately) , and  finally  entering  the  result 
in  column  9® 

7 » Graphical  analysis  consists  of  plotting 
the  data  on  suitably  ruled  paper,  draw- 
ing the  estimated  straight  line,  drawing  . 
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in  the  control  curves,  and  seeing  how  well 
the  data  fall  within  them.  The  method  is 
essentially  due  to  Gumbel  (c£«  -reference 

5)  • 

(a)  In  the  section  of  Worksheet  2 called 

Plotting  Positions,  arrange  all  n obser- 

/ 

vations  in  the  sample  in  a single  ascend- 
ing series  from  smallest  to  largest 
and  enter  them  opposite  the  rank  nun® 
bers  r = 1 to  n„  Compute  and  enter 

the  plotting  positions  fj)(x)  = r • 

• n + 1 

Then,  on  a sheet  of  Extreme  Probability 
Paper"11  such  as  used  In  Figures  4 and 
5,  plot  the  points  (xr,  The 

observation  x^  is  plotted  on  the  uniform 
scale  along  the  horizontal  axis?  the 
fraction  is  plotted  along  the 

nonuniform  vertical  scale  <{)(x)o  These 
points  are  plotted  as  shown  in  Figure 

4* 

# I.e.,  coordinate  paper  with  one  scale  (x)  uniformly 
spaced  and  the  other  (y)  distorted  In  such  manner 
that  the  extreme -value  distribution  exp(-e~y)  will 
. plot  as  a straight  line* 


• v . . ; • ' 
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(b)  After  the  points  are  plotted  the 
estimated  line ‘X  - u ■+  3y,  i.e„, 
x = ,9295  + .1677y  (see  Rule  3, 
above),  is  drawn  tkrough  them*  This 
is  easily  done  from  columns  2 and  3 
(Worksheet  2),  since  column  3 gives 

A 

the  predicted  values  of  x(=£p)  cor- 
responding to  the  values  of  y(=yp) 
in  column  2,  An  even  simpler  method 
is  to  take  two  or  three  widely  separated 
values  P in  column  1 together  with  the 

A 

corresponding  values  £p,  plot  them  on 
the  <j)(x)  and  x scales,  respectively, 
and  draw  the  line  through  them* 

(c)  The  68-percent  control  curves  are 
obtained  by  measuring  off  horizontally, 
at  each  value  of  P in  column  1,  the 

A 

distance  a(£  ),  taken  from  column  7,  to 
the  right  and  left  of  the  fitted  line, 
and  then  joining  all  the  right  and 

i'  ’ 

all  the  left  endpoints  of  the  intervals 
so  formed,  as  in  Figure  Ij.  The  area 
included  between  the  two  control  curves 
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is  the  68-percent  confidence  band* 

If  most -or  all  of  the  plotted  points 
fall  wi thin  the  band,  as  in  Figure  ii, 
then  we  conclude  that  the  fit  is  sat- 
isfactory and  furnishes  no  evidence 
that  any  of  the  basic  assumptions  are 
violated. 

(d)  The  fitted  straight  line  provides  the 

'•  i 

predictions  for  any  desired  probability 
level  P,"  For  example,  the  prediction 
for  P = .995?  which  means  a value  of 
acceleration  increment  which  has  only 
one  chance  in  200  of  being  exceeded, 
is  obtained  (in  Figure  4)  by  reading 
across  to  the  solid  (fitted)  line  at 
P = .995?  and  down  to  find  the  value 

i 

x = l,082g«  This  Is  sufficiently  close 
to  the  value  1.8176  obtained  by  cal- 
culation, using  the  value  y^^  = 5,29531. 
The  68-percent  curves  give  a confidence 
interval  for  this  value  of  approximately 
1.66  to  1*98.  This  means  that  there 

On  the  probability  paper  (Figures  k?5)?  P is 
denoted  by  |T(x) . 


■ • 
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is  a probability  of  about  two- thirds 

that  such'  an  interval  includes  the 

true  predicted  value  at  we  are 

w P 

trying  to  estimate*  The  efficiency 
associated  with  this  estimated  is 
between  80,3  percent  and  82,6  percent 
(column  9) , ••slUff':ic;iently  narrow  limits 
for  practical  purposes,  rf  a more 
accurate,  value  for  the  prediction 
or  measure  of  efficiency  is  desired, 
it  can  be  readily  obtained  by  insert- 
ing a MP  = *995  line”  in  the  first 
table  on  Worksheet  2 and  performing 
the  computations  indicated  in  columns 
2 through  9® 


. c . 
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6,  COMPARISON  WITH  METHOD  IN  PRESENT  USE 


It  Is  of  Interest  to  compare  the  proposed 
order-s tatistlcs  method  with  the  method  of  moments 
of  E,  J.  Gumbel  which  has  been  used  up  to  now  In 
extreme  gust-load  computations  (reference  18) • The 
comparison  is  presented  in  two  aspects  — theoretical. 
Involving  an  empirical  attempt  to  evaluate  the  bias 
and  ef  fi  ciency'kof  the  Grumbel  estimator;  and  practical, 
showing  how  the  two  methods  work  out  in  an  actual 
example » 

6oX  Theoretical  Comparison 

Only  the  general  results  will  be  indicated 
here,  the  details  being  furnished  in  Appendix  E. 

The  comparison  consists  in  writing  down  the  Grumbel 
estimator,  a function  of  the  observations  involving 
the  sample  mean,  standard  deviation,  and  the  profe- 
ability  factor  Jp,.  and  then  obtaining  the  bias  and 
the  relative  efficiency  of  the  proposed  order-statistics 
estimator  to  the  Gumbel  estimator,, 

# For  theoretical  comparison  of  confidence  bands, 
see  Appendix  D„ 


. 

f.  . 
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Of  the  two  characteristics  bias  and  efficiency* 
the  main  interest  at  this  point  is  in  determining  the 
efficiency  of  the  proposed  method,  since  that  is  the 
important  feature  whereby  possibilities  of  cost 
savings,  through  taking  fewer  observations 5 can  arise , 
Bias  is  less  important  for  this  purpose,  and  its 
consideration  is  therefore  limited  to  the  appendix 
(E), 

As  shown  in  Appendix  E,  (Section  1 relative 
efficiency  involves  the  first  two  moments  of  the 
sample  mean  and  sample  standard  deviation,  and  the 
covariance  of  the  mean  and  standard  deviation.  Of 
these,  only  the  first  two  moments  of  the  sample 
mean  can  be  obtained  readily  by  standard  procedures, 
while  the  remaining  three  quantities  would  require 
a prohibitive  amount  of  numerical  integration  to 
evaluate  accurately. 


^T'l’he  present  'discussion  compares  the  order-statistics 
estimator  with  the  Gumbel  estimator  i ~ x + 

,/7~ . Cr 

#(yp-/)sx*  As  explained  in  Appendix  E,  this  estimator 
isra  simplified  form  of  Gumbel 1 s original  estimator, 
and  is  used  when  the  sample  of  extremes  is  large. 
Appendix  E also  considers  the  original  Gumbel 
estimator,  which  is  a more  complicated  expression 
used  for  small  samples,  and  shows  that  this  estimator 
is  both  more  biased  and  much  less  efficient  than 
the  simplified  estimator. 


. • • •; 
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Resort  was  therefore  had  to  a method  whereby 
the  theoretical  extreme -value  distribution  was  repre- 
sented by  a large  set  of  suitably  constructed  random 
numbers*  By  means  of  these  numbers  a large  number 
of  actual  random  samples  were  drawn  and  the  results 
tabulated*  This  was  carried  out  mechanically  with 
high-speed  IBM  equipment*  By  using  12,000  random 
numbers,  1200  random  samples  of  10  were  drawn  and  a 
single  average  figure  for  relative  efficiency  was 
computed  for  each  set  of  100  samples*  All  these 
computations  were  made  for  the  single  probability 
level  P = Other  values  of  P are  considered 

below® 

The  results  are  shown  in  Table  V and  portrayed 
in  Figure  6*  For  samples  of  10,  the  efficiency  was 
greater  for  the  proposed  order-statistics  estimator 
in  5 rases  out  of  12  (relative  efficiency  R (column 
8)  greater  than  1)  and  greater  for  the  present  moment 
estimator  in  7 cases  out  of  12®  The  average  of  all 
12  relative  efficiencies  was  very  nearly  unity,  These 
results  suggest  that,  for  samples  of  10,  the  two 
methods  are  equally  efficient. 


2J  v - • .. 
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The  entire  procedure  was  repeated  for  samples 
of  20,  obtaining  6 (instead  of  the  previous  12)  values 
for  the  6 sets  of  100  samples  each.  As  Table  V (column 
9)  and  Figure  6 show,  the  balance  now  was  £ to  1 in 
favor  of  the  proposed  method,  with  the  average  being 
1.11,  representing  11  percent  greater  average 
efficiency  for  the  proposed  method. 

For  samples  of  30,  there  were  1|  sets  of  100 
samples  each,  and  the  results  (column  10)  were  3 
to  1 in  favor  of  the  proposed  method.  The  average 
relative  efficiency  was  1.13,  representing  a 13 
percent  gain  in  average  efficiency. 

To  see  the  effect  of  different  probability 
levels  on  these  results,  computations  were  under- 
taken for  several  values  of  P beyond  *95 • However, 
in  order  to  avoid  needless  calculation,  in  view  of 
the  fact  that  only  qualitative  conclusions  are 
warranted,  the  above  procedure  was  modified  as 
follows.  The  sets  of  100  samples  were  combined  for 
each  sample  sise,  and  a single  overall  average  for 
relative  efficiency  was  obtained  for  the  1200  samples 
of  30,  the  computations  being  carried  out  for  the 
selected  probabilities  P = .95,  »99>  *999,  and  the 
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limiting  value,  unity*  The  results  are  shown  in 
Table  VI.  In  addition,-  theoretical  calculations** 
were  made  to  obtain  the  asymptotic  relative  efficiencies 
as  sample  size  increases  without  limit*  These  values 
will  be  found  at  the  bottom  of  column  9 of  Table  VI, 

The  above  add! tional  results  indicate  that 
increasing  the  probability  P tends  to  increase  the 
efficiency  of  the  proposed  method  relative  to  GurabeX ' s , 
It  should  be  pointed  out  that  these  values 
obtained  from  the  empirical  sampling  method  are 
indicative,  rather  than  conclusive,  on  account  of 
the  random  variation  inherent  in  the  method,  as  man- 
ifest in  the  wide  fluctuation  in  efficiencies  shown 
in  Table  V for  the  individual  sets  of  100  samples. 
Nevertheless,  the  above  results  do  give  strong  in- 
dication for  the  following  statement: 

For  samples  of  10,  the  proposed  order- 
statistics  method  is  about  as  efficient 
as  the  method  of  Gumbel,  while  for 
samples  of  20  or  30  or  more,  the  pro- 
oosed  method  is  more  efficient. 


Since  these  calculations  are  mainly  of  theoretical 
interest,  they  have  been  omitted  in  order  to  keep 
this  report  from  becoming  unduly  long. 
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For  ? = *95  or  greater,  this  Increase  in 

efficiency  is  about  12  to  15  percent 
for  samples  of  28  to  30,  and  ultimately 
rises  to  25  to  30  percent  for  indefinitely 
large  samples * 

If,  in  the  comparison  presented  above,  the 
simplified  Gumbel  estimator  is  replaced  by  the 
original  form  of  the  estimator  (Appendix  E,  Section 
2),  then  the  comparison  becomes  much  more  favorable 
to  the  proposed  order-statistics  method  and  it  can 
be  stated  that: 

For  samples  of  10,  20,  and  30,  and 
? = .95  or  more,  the  order-statistics 
method  is  up  to  twice  as  efficient  as 
the  Gumbel  method  using  the  original 
estimator.  Moreover,  this  100  per- 
cent difference  in  efficiency  between 
the  two  methods  is  of  sufficient  mag- 
nitude not  to  be  significantly  affected 
by  the  sampling  errors  inherent  in  the 
method  of  evaluation. 


6*2  Comparison  based  on  a 
sample  of  actual  observations 

We  shall  use  the  same  data  already  analyzed 
by  the  order-statistics  method,  consisting  of  the 
23  maximum  acceleration  increments  listed  in  Work- 
sheet 1.  For  convenience  we  shall  use  a standard 
form  of  worksheet,  employed  by  the  Environmental 
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Protection  Section  of  the  Office  of  the  Quarterm&s ter 
Genral,  Department  of  the  Army,  (reference  i 7) 
for  applying  the  method  of  moments  of  E,  J»  Gumbel, 

To  avoid  confusion  with  the  Worksheets  1 and  2 dis- 
cussed previously,  these  new  worksheets  shall  be 
referred  to  as  Table  VII—' Part  A and  Part  3,  The 
items  are  filled  in  on  both  parts  as  directed,  except 
that  the  factor  N/^N-l)  is  ignored  in  Sections  I 
and  IV,  since  subsequent  theoretical  investigation 
has  shown  its  use  to  be  incorrect;  also,  the  values 
x^q  and  x^Q^  in  Section  III,  and  the  entire  Section 
V are  not  needed  for  our  purposes.  The  values  of 
on  and  yn  In  Section  II  are  taken  from  a table 
supplied  with  the  worksheets  but  omitted  here. 

Comparison  is  best  shown  graphically,  as  in 
Figure  5>.  It  will  be  seen  that  in  this  particular 
case  the  fitted  lines  given  by  the  two  methods  are 
not  greatly  different,  the  predicted  values  difer- 
Ing  by  amounts  varying  from  .03  g at  the  P ~ ,95> 
level  (1  chance  in  20  of  being  exceeded)  to  nearly 
.10  g for  P = .999  (1  chance  in  1,000  of  being  exceeded). 

The  most  striking  and  significant  feature  about 
the  coroparisD  n in  Figure  £ is  the  narrowness 
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of  the  confidence  band  for  the  order-statistics  method 
compared  with  that  of  the  Grumbel  mthod.  This  is 
attributable  mainly  to  that  fact  that  in  the  case 
of  the  order-statistics  estimator  the  confidence- 
band  width  is  based  on  the  standard  deviation  of 
the  estimator,  computed  by  the  methods  indicated  in 
this  report,  whereas  in  the  case  of  the  moment  (Gumbel) 
estimator,  the  standard  deviation,  whose  value  is  not 
known,  is  replaced  by  a standard  deviation  that  can 
be  readily  calculated,  but  which  results  in  an  un- 
necessarily wide  confidence  band  (for  details  see 
Appendix  D) „ 


6,3  Advantages  and  Limitations 
of  Proposed  Method 


From  the  discussion  given  herein  it  appears 
that  the  proposed  order-statistics  method  offers 
the  following  advantages  over  the  method  of  moments 
now  in  use: 

a.  The  proposed  method  provides  for  the 
first  time  an  estimator  known  to  be 
unbiased,  whose  efficiency  can  be  simply 
and  accurately  evaluated. 
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b.  The  new  estimator  is  more  efficient 

than  a simplified  form  of  . the  Gumbel 
estimator,  for  samples  of  about  20 
or  more  and  P ~ 221  ^ more • 

Compared  with  the  original  form  of 
the  Gumbel  estimator,  the  new 
estimator  is  up  to  twice  as  efficient 
for  the  same  range  of  values  of  P 
and  for  samples  of  10  or  more * 

c.  The  calculations  necessary  for  the 
proposed  method  are  simple  and  uni- 
fied, giving  simultaneously  (i)  estimates 
of  both  parameters,  (ii)  the  predicted 
values  corresponding  to  assigned  prob- 
abilities and  the  reliability  of  these 
values,  and  (iii)  estimates  of  the 
efficiency  of  the  method, 

d.  The  proposed  method  uses  a more 
valid  procedure  for  obtaining  the  re- 
liability of  predicted  values,  and 
this  procedure  yields  smaller 
confidence  intervals  in  many  cases, 

(See  Appendix  D#) 

The  following  two  limitations  of  the  proposed 
method  should  be  kept  in  minds 

a.  As  is  true  of  any  other  method  of 

analyzing  data,  its  use  is  appropriate 
only  when  the  assumptions  upon  which 
it  is  based  may  be  considered  to  be 
approximately  satisfied,  namely? 
all  the  observations  constitute  an 
independent  random  sample  from  the 
same  population  P(x)  = exp (e“ (x-u)/p) 

(in  cumulative  form)® 

b«  The  assumption  that  the  data  are ’to 
be  available  in  the  order  in  which 
observed  is  of  some  importance. 

For  if  the  data  are  first  rearranged, 
grouped  or  processed  in  any  manner. 
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their  randomness  must  be  con- 
sidered lost.  In  order  to  use  the 
proposed  method  it  will  then  be 
necessary  to  restore  randomness 
by  use  of  a table  of  random  num- 
bers to  rearrange  the  data.  This  is 
less  desirable  and  the  original 
order  should  therefore  be  preserved 
if  possible. 

This  necessity  of  avoiding  preliminary 

processing  imposes  a disadvantage  on  the  proposed 
method,  as  compared  with  the  Gumbel  method  of 
moments,  when  the  sample  is  very  large  (several  bun- 
dred  or  more,  say).  In  the  latter  method  the  data 
may  be  groupie1, simplifying  the  computations.  The 
method  of  order  statistics,  on  the  other  hand,  is 
not  applicable  with  grouped  data  - each  observation 
must  be  treated  on  an  individual  basis  - and  hence 
is  not  suitable  for  occasional  enormous  samples, 
as  is  the  Gumbel  method.  However,  for  such  masses 
of  data  an  even  simpler  method,  described  in  Appen- 
dix C,  is  available. 
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7*  CONCLUDING  REMARKS 


This  report  has  developed  and  illustrated  a 
new  method  of  analyzing  extreme -value  data  based  on 
order  statistics  that  is  simple  to  use  and  offers 
certain  important  advantages  over  the  method  of 
moments  of  Gumbel  now  in  use,  as  well  as  being  subject 
to  certain  limitations  (see  Section  6*3), 

In  view  of  these  considerations,  this  new 
method  is  recommended  for  practical  use  in  place 
of  the  present  method  of  estimation. 

In  developing  an  estimator  intended  to  be 
useful  and  efficient  a number  of  subsidiary  questions 
were  encountered  and  treated*  The  most  important 
of  these  were  (i)  obtaining  mini mum- vari ance  unbiased 
linear  functions  of  order  statistics  for  small  samples, 
and  (ii)  finding  the  most  feasible  v/ay  of  breaking 
up  a large  sample  into  subgroups  small  enough  to 
take  advantage  of  the  results  in  (i).  In  addition, 
considerable  attention  was  given  to  a number  of 
theoretical  points  of  difference  between  the  pre- 
sent and  proposed  methods. 

Such  theoretical  study  showed  that  one 
feature  of  the  present  Gumbel  method,  namely  deter- 
mination of  the  confidence  intervals  or  control  curves 
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for  large  values  of  P,  does  not  have  a suitable 
theoretical  basis.,  and  that  certain  adjustments 
should  be  made  in  the  formulas,,  These  adjust- 
ments would  have  the  effect  of  replacing  the  parallel 
control  lines  by  diverging  curves  in  the  regions 
of  high  values  of  P,  resulting  in  smaller  confidence 
intervals  for  the  more  common  values  of  P and  larger 
intervals  for  the  higher  values  of  P that  occur 
less  often  in  practice. 

The  solutions  to  the  above  two  main  auxiliary 
problems  have  been  incorporated  into  a simple  set 
of  tables  and  a pair  of  unified  worksheets  designed 
so  that  the  computations  show  at  a glance  the  essential 
quantities  of  interest  - the  actual  predictions,  their 
reliability,  and  the  efficiency  of  the  methocU  The 
method  includes  provision  for  showing  these  results 
graphic  ally. 

The  present  study  has  also  devoted  some  at- 
tention  to  a method  involving  empirical  random 
sampling  and  IBM  tabulating  equipment  in  cases 
where  direct  numerical  evaluation  is  prohibitive. 

The  use  of  12,000  random  numbers  and  from  400  to 
1200  random  samples  was  found  insufficient  to  yield 


<•*  - 2. 


' >■  ■ ‘ MiS.  a 

■ ' "•  sm . - gffe 

■ j ■ ■ ' v 


: -d-  V."-  O^irSX 

■'  : c :?'  da  ad 

; ' • ■ ; - • ' ' " : | -'V 

■ 

■ . S«  £ $M.i  ■ I 3 

5:.a  Idetmsp 

rJ  >1  / d-r 

. • : a: 


.V-  ■ ' d d 


: ■ \i>  : -r-zf* 

■ " " - ■ &r  ' :m  m ■ # r /,  • % s 

- ' ’ : ■ ■ v?  - ■ . 'j«  2 


70 


accurate  quantitative  results  for  one  form  of  the 
Gumbel  estimator  (the  simplified  form)  on  account  of 
sampling  variation.  However,  definite  qualitative 
results  in  favor  of  the  proposed  method  were  indicated 
in  the  case  of  samples  of  20  and  30  and  theoretical 
calculation,  showed  that  this  advantage  was  consider- 
ably greater  for  indefinitely  large  samples. 

As  a result  of  the  experience  gained  in  these 
studies  it  seems  likely  that  for  accurate  results 
perhaps  ten  times  the  number  of  samples  used  (or 
more)  should  be  taken  and  the  computations  performed 
through  specialized  procedures  on  high-speed  electronic 
computing  equipment. 

Further  calculation  showed  that  in  the  case 
of  the  original  form  of  the  Gumbel  estimator, 
much  more  definite  statements  were  possible  concerning 
efficiency.  In  this  comprison  the  proposed  estimator 
turned  out  to  be  up  to  twice  as  efficient  as  that  of 
Gumbel,  not  only  for  the  sample  sizes  20  and  30,  but 
down  to  samples  of  10  as  well.  For  very  large  samples 
this  advantage  dropped  somewhat,  but  the  proposed 
estimator  remained  at  least  20  to  30  percent  more 
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APPENDIX  A 


PROOF  THAT  SUFFICIENT  STATISTICS  DO  NOT  EXIST 
FOR  THE  PARAMETERS  OF  THE  ^ 

EX  TREME- VALUE  DI S TRI  BUTT  ON 

(Relates  to  Section  ij.,2  of  text) 


Problem;  We  have  a sample  of  n from  the 
extreme-value  population  whose  density  function 

is 

■a (x^u) 


-SHf 


■a 


(x~u) 


f(x)  = ae 

(3  = 1/a  > 0 and  u are  unknown  and  we  wish  to  find 
sufficient  statistics  for  them. 

Theory;  (1)  If  t = (tf,,9.,tk)  is  sufficient 


(i.e.,  is  a set  of  jointly  sufficient  statistics) 
>r  9 
x = (x- 


for  9 - (9~,...,9tm)  then  the  density  function  of 
1 1*1 


$ a © o $ 


x ) may  written  in  the  form 


P(x,9)  = f (t,9)  g(x) 


(2) 


If  t(x)  = t(x8)  for  sample  points  x and 


x ’ , then 


P (x,9)  _ g (x 

PTF^T©!  gTScTJ 


- h(x,x 1 ) 


e 


This  Ap  p e n d Ix'lias'lTe e n prepared  by  I*  Richard  Savag 
of  the  Statistical  Engineering  Laboratory,  National 
Bureau  of  Standards. 

For  convenience  the  symbol  a is  used  in  place  of 
the  parameter  (1/p)  of  the  text. 
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(3)  Hence  for  all  those  points  where  t(x)  has 
a constant  value  the  ratio  A is  free  of  9,  and  thus 
we  can  find  sufficient  statistics  by  seeing  for 
which  point  sets  A is  constant. 

(ij.)  Evidently*  if  0 = g(9f)  G. 

X 

i y 

= (9^,  . . . , 9^)  , i = l,**.,k)  is  a non-singular  trans- 

formation of  the  parameters,  then  also 


P(x,9»  ) = f (t,9» ) g(x) 

PTxt79tT  f(t,9«)  glx) 


h (x,x ! ) * 


using  the  same  (set  of  estimators)  t as  for  0.  In 
other  words,  if  a set  of  statistics  t is  sufficient 
for  a set  of  parameters  0,  the  same  set  t is  sufficient 
for  any  other  set  ©J  obtained  from  0 by  a non-singular 
transformation* 

Results ; We  will  now  apply  the  above  theory 

to  the  problem  at  hand  and  shovr  that  the  largest  point 

set  on  which  A is  constant  contains  nl  points,  that  is, 

it  takes  n functions  to  describe  t,  so  that  the 

resulting  sufficient  statistic  Is  the  trivial  set, 

2 n 

t = (x  , ..*,x  ) or  In  other  xjords, 

the  only  sufficient  statistics  are  the  n observations 
themselves,  so  that  we  do  not  have  a basis  upon  which 
to  construct  optimum  estimators,, 


t 

- - ■ 


•’  i ’ 


' ■■ 
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Analysis ; For  the  distribution  f(x)  we  have 


a f r ? v S ? r -a(x.-u)  -a(xj  ~u)ll 

A = exp  < a ( 2 x»  - 2 x. ) - 2 (e  1 -e  i J « 

l 1 1=1  1 i=l  J. 


1 


i=l 


If  A is  free  of  the  parameters  (3  and  u then  it  is 

also  free  of  a = 1/6  and  u,  and  so  are  log  A and 

6^ log A . Hence 
8 a* 


, , , - - v _ r -a(x^-u)  --a(x,  -u ) 

log  A = na(x“X' )-  2 [ e 1 -e  x 


n 

2 

i=l 


] 


Let  u approach  - go  «,  We  first  find  that  x - x * in 
order  to  have  log  A free  of  u and  a.  Next, 

6klogA  , , xk+1  Sr/  ~a(x  «u)  « k -afx^-u) 

%-  = (-1)  2 [(x,~u)  e 1 - (xi-u)  e 1 

6a*  i=l 

0 , k >«  #| 

and  this  is  true  for  k = 1 as  we  11,  since  x = x». 

Since  this  is  an  Identity  in  u let  us  set 
u = 0c  We  then  get 

n ....  n 


" x . e 

1=1  1 


ax-  ;;  i k - ax . 

= l x.  e i 

i=l  1 


These  are  finite  suns;  and  therefore,  since 
they  are  identitites  in  a,  it  is  clear,  since  a 

may  converge  to  zero,  that 

k «k 
2x>  2x. 

1 _ i 

n n 


78 


Thus  the  largest  set  of  points  of  constancy  of  A 
are  those  points  which  give  the  same  sample  moments, 
and  this  fact  implies  the  desired  result® 

Note  c Statement  (4)  above  implies  that  the 
result  also  holds  if  the  parameter  u is  replaced  by 
£p  = u + (3yp  = u + yp/ a® 

Example ; To  show  how  this  method  works  for 
a familiar  problem  consider  a sample  of  n from  a 
normal  distribution!  here 

-l/2a2[X(x,  -G)2  - 2 (x!  “0 ) 2 ] , 

A - e 1 ± 

=21ogA  = [2 (x^»x^2 ) - 2©2(x. -x^ ) ]/a2  , 

and  clearly  the  necessary  and  sufficient  condition 

? 2 

for  A to  be  constant  for  all  a and  © is  that  - 

>2  t 

2x.  2x^  = 2x^ , which  is  the  classical  result  that 

the  first  two  moments  are  sufficient  statistics® 


' , ■ ..  , ■■ : ; ; -«  ' 

: ; b 

■:  ■ 


APPENDIX  B 


MATHEMATICAL  FORMULA  HON  AND  SOLUTION  OF 
MINIMUM- VARIANCE  PROBLEM 

(Relates  to  Section  Ij. «. 3 of  text) 


rv 

We  consider  an  estimator  of  = u + (3y  of 


the  form 


n 

2 WjX.  9 

i=l  J' 


(B.l) 


where  x_  < x0<  . < x are  the  n order  statistics 

of  a sample  of  n from  the  extreme -value  distribution 
(4*1)*  and  seek  to  find  the  w^  which  minimize  Var(L) 
subject  to 

E(L)  = |p  • (B,2) 

The  estimator  L in  (B. 8) below  with  weights  so 
determined  is  called  the  minimum-vari.  an c e , unbi  ased , 
(linear)  order-statistics  estimator  for  sample  size 
n0 

Wri ting 

x = u + (3y  , (B,3) 

where  y is  the  reduced  variable  corresponding  to 
x,  we  also  have 

X1  — u + pyj.  (B.4) 

where  y < yP  < ...  < y are  the  n order  statistics 
of  a sample  of  size  n from  the  reduced  distribution 
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exp (-e~^r)  9 free  of  parameters.  From  the  above  it 
follows  that 

E(xi)  = u + pE(yi)  , (B.5) 

since  u and  (3,  though  unknown , are  constants  not 
subject  to  sampling  variation  when  the  operation  of 
expectation  is  performed®  The  values  E(y^)  have  been 
tabulated  in  reference  ll|  for  i = n(l )niin (1  ,n-25>)  t 
n = 1(1)10(5)60(10)100*. 

These  results  give  readily 
E (L ) = Swi(u+pEyi)  = gp  = u + (3yp  . (B.6) 

This  is  required  to  be  an  identity  for  all  values 
of  the  parameters  u,  Equating  their  coefficients 

gives  the  two  conditions  on  the  weights  : 
n 

2 w,  = 1 
i=l  1 

(B.7) 

n 

2 (Ey  )w  = y 
i=l  1 1 p 

where  Ey^  are  the  numerical  values  tabulated  in 
reference  II4. « 

-Ve  The  notation  in  the  table  cited  differs  from  that 

used  here:  E(y. ) in  this  report  corresponds  to 

E(y  . ) in  the  table® 

Jn-i 


I. 


■ 


' 


■ 
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Turning  to  the  variance,  we  have 

n 

2 w.w  . 


n 2 2 n n 

Var  (L ) = 2 w.  a ' + • 2 2 w,w 

i=l  1 .1=1  i=l  1 J~x,x 


i J 


From  (Boii.)  and  the  properties  of  the  variances  and 
covariances  of  linear  estimators  we  have 


2 _ 2 2 


'Xj 


P o *■  = 62a?} 

y* 


o 


x-x  . 

1 J 


p2°y1yj  = p2°ij  ' 


making  an  obvious  simplification  in  notation,  whence 

VR  = Var(L)  = (2o^w?+22  1 jW^w  j )p2 

= minimum  subject  to  (B.7)®  (3.8) 

This  is  a constrained  minimum  problem  for 
variation  in  the  unknown  w^,  and  is  equivalent  to 
finding  the  (uncons trained)  minimum  of  ” 

G1  = ^2aiwi+S2?ci  jwiwj?p2  + ^1(2wi-l)  + ^i1[2(Eyi)wi-yp], 

where  jiu  are  the  Lagrange  multipliers.  Since 

2 

(3  > 0 is  constant,  though  unknown,  this  is  the  same 

as  minimizing 

G^  ? 2 

G — ki  w 22^  o ^ w j ^ v S w ^ X ) 

^i[2(Eyi)v;i»ypJ  , 

W Yhe ~"tempor ary  notation  yu,  fj-  should  not  be  confused 
with  the  symbols  for  moments. 


rn  v 


: ' 


- 


. 
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2 2 

whereas  ?[  /$  » f1  ~ p * Setting  the  derivatives 
with  respect  to  w,  , k = 1,  2,  , n,  equal  to  0 

K 

and  dividing  by  2,  we  have 


2 n . 

°kWk  + °ikWi  +'/'  +yuE7jc  = 0,  k = l,2,»,,^n  (B,9) 

i~l 

(i+k) 

These  latter  are  n linear  equations  which,  with  the 
two  in  (B.7),  form  a simultaneous  system  of  (n+2) 
equations  in  the  (nd-2)  unknowns  w^,  W£,  . w , 
p 6 The  values  of  X and  p are  useful  as  a check,  since 
if  (B.9)  is  multiplied  by  w^  and  summed,  the  result, 
in  view  of  (B.?)  which  the  wr’s  satisfy  and  (£08)  is 


V . + yf+  /UY— 

n,mn  /v  / J? 

that  is,  we  should  have 


0 


V 


n,min 


7 


The  minimum  value  V„  » „ x-d.ll  be  denoted  bv  Q,  » 

Before  solving  the  set  (3„7)*>  (B.9)  it  is 

necessary  to  determine  the  coefficients  in  these 

linear  equations.  The  values  of  Ey^f  are  tabulated, 

as  already  mentioned.  The  variances  and  covariances 
2 

Ok,  involve  complicated  integrals.  The  author 

has  been  successful  in  expressing  these  integrals 
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in  terms  of  simpler  ones  already  tabulated  (reference 
12),  The  results  ar-e,  for  the  variances 

of  " E(yf)  » (Eyi)2 

j p00  i ^ jj 

E<yf>  =TTrifn^T)T  rf0(-1)cr  g2(i+I,)» 


1 = 1 .2  8 . • 


t $ • * 9 f 


n 


where 


g^(i+r)  = f + (/+logi+r ) ] 


and  { = Euler  *s  constant  = ,57721  5&61j.9«»®J  and  for 
the  covariances^ 


= E(ytyj)  - (Eyi)  (Eyj) 

E(yiyi}  s IT- 17 TT 3^1  -I )T( jTT 


j-i-1 

2 

r=0 


s=0 


°CjB‘“i“1Cn'"%(i+r,  j-i  -r-J-s), 
r s 

i < j j > *j  -•  1 J «■  J J e » , s ^ y 

where  the  function  0 is  defined  by 

2tu0(t,u)  = (u-t)g2(t+u)  + t'*[g1(t)]  ~ 2L(1+^-)  + , 


in  which  g2  is  the  same  function  as  before, 
g (t)  = ■|(3/+logt)  , and 


All  1 o g ar i t hms  are  natural  logarithms,  to  the  base  e 


- 


_ 
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3 +x 

L (1+x)  = _(  ' dw 


CO  t 


-1) 


n+1 


x 


n 


n 


2 2 

= Idog  x)  + ^-  l ^+x) 

is  Spence  ?s  integral,  which  has  re^r:  lost  extensively 

tabulated  (to  12  places)  in  reference  16,  The  function 
g^  also  occurs  in  an  expression  for  the  means: 

E(yP  = irnriTn-oTT  2 (-urc  n 1s.l o.+r*}  . 

The  above  formulas  have  been  evaluated  as  far 

as  n = 6 and  the  results  are  listed  in  Table  II, 

The  values  in  the  table  are  believed  to  be  accurate 

to  the  number  of  places  shown.  Those  for  the  means 

agree  (to  within  a unit  in  the  7^  pi  see)  to  the  7 

places  to  which  the  means  have  previously  been  tabulated.. 

Table  II  thus  provides  the  coefficients  in 

the  system  of  equations  (B*7)>  (B»9)  in  the  w and/1, 

jUc  The  right-hand  sides  of  these  (n+2)  equations  are 

1,  Yv>*  * . » , 0 and  the  solutions  w.  , /v>  jx  are 

linear  combinations  of  these  with  numerical  coefficients 

2 

which  involve  only  o.?  , a,  and  Ey^,  but  not  y^. 

Hence  the  solutions  are  all  of  the  form 


wi  = y + hyp  * 

A = ox  + dxyp 

p = °2  + d2yP  • 


i = 1,  2, 


n 


> 

■ - 

. 

■ ; ; 

•: 

. 


■ 

_ 


Substituting  these  w^  in  (B.8)  we  have  an  expression 
of  the  form 


<4  = V . 

n n,imn 


(Anyp+B„yD+ 


n-'P 


n 


(B ,10) 


The  quantities  h^  for  the  weights  w , and  the 

coefficients  An,  Bn,  Cn,  of  Qn,  are  given  in  Table 
I for  n = 2 to  6,  The  solution  of  the  system  of 
equations  became  increasingly  lengthy  for  increasing 
values  of  n,  with  correspondingly  diminishing  accuracy, 
so  that  the  computations  were  discontinued  beyond 
n = 6,  The  procedures  for  handling  samples  larger 
than  n = 6 are  explained  in  the  text  of  this  report. 


APPENDIX  C 


SHORT-CUT  METHOD  FOR  VERY  LARGE  SAMPLES 
(Relates  to  Section  of  text) 

If  we  have  a sample  of  several  hundred  or 
more  extreme  observations,  as  may  sometimes  be  the 
case  (e.g.  reference  18,  where  a sample  of  ij.85 
extremes  was  analyzed)  it  is  possible  to  select  just 
three  out  of  all  the  observations  and  from  them  obtain 
useful  estimators. 

This  technique  is  based  on  a method  used  by 
F.  Mosteller  (reference  13)  for  samples  from  the 
normal  distribution,.  If  the  n -sample  values  from  a 
(continuous)  population  whose  density  is  f(x)  when 
arranged  in  ascending  order  are  denoted  by  the  order 
statistics  x , x^j,  •••,  x^,  and  n Is  very  large,  the 
application  of  Mosteller ' s method  involves  taking  the 
observations  whose  ranks  are^n,  yun,  -Jn,  where 
0 < /l<  p < V<  1 with/},  ju,  -j  suitably  determined, 
and  choosing  a and  b so  that'' 

When  [as  will  generally  be  the  case)  the  ranks  2 n, 
pn,  zJn  are  not  Integers,  they  will  be  defined  to 
he  the  nearest  integers  to  these  quantities. 


" ; ■■  j:  i. 


r 


--- 


' 
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E = ax  + b (x  - x«  ) (C  *1 ) 

v jun  i/n 

is  an  (asymptotically)  unbiased  estimator  of  the 
parameter  g = u + py^*  (The  reason  for  choosing 
this  particular  form  is  discussed  below)® 


The  mean  and  variance  of  the  estimator  £ in 

(C.l)  are  computed  from  the  corresponding  moments  of 

order  statistics  of  the  form  x%  , with  n very  large 

An 

andX  a proper  fraction  not  too  near  0 or  1*  Under 

these  circumstances  the  theorem  used  by  Hosteller 

states  that  in  the  limit,  as  n increases  indefinitely, 

(i)  becomes  normally  distributed,  with  mean 

4n 


and  variance 

E(x,  ) = t. 
An  A 


2,  s ? a-a). 
° x;m  - TrfrhpT2 


h 


where  t is  defined  by  A 

A 


J1 


(C  ,2) 
(C®3) 

f(x)dx;  and 


(ii)  the  covariance  of  any  two  order  statistics 
x^n  and  x , ^ < fa,  is  given  by 

oov (X/In*  V’  = nH"t^ry)  • (c*w 

where  t is  defined  similarly  to  t/\« 

P * 


V ■.  , ■ ■ r?.H  / 


. : 


•v  ' 
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For  £ in  (G  ,1 ) to  be  unbiased  in  the  case 
where  f(x)  is  the  extreme  value  distribution* 

E(|)  " Ip  ~ u + (G ®5) 

must  be  an  identity  in  u,  (3.  Vie  first  note  that 

from  previous  discussion  in  the  text  (see  Section 

4.1),  the  parameter  | is  precisely  the  abscissa  of 

P 

the  ordinate  which  cuts  off  the  area  P to  the  left® 
Hence  we  have  simply 


= u + py^  . (c ,6) 

Equations  (C.l),  (C.2),  and  (C*5>)  then  give 

at  + b(tv  - t3)  = u + py 

A1  . 


or 


a(u  + py^)  + b(y^  - y^)p  = u + ypp  # 
from  which,  on  equating  coefficients  of  u and  p, 

yp  - y, 


a = 1,  b 


LE 


(C.7) 


ll/  - \ 

In  principle,  the  fractions  A , p,  if  might  be  deter- 

A 

mined  so  as  to  minimize  the  variance  of  £ and  thus 
make  its  efficiency  a maximum,  but  this  would  require 
very  extensive  computation  which  would  not  be  warranted 


on  account  of  the  limited  importance  of  efficiency 
when  the  available  sample  is  very  large.  (For  example, 
a 5>0  percent  efficient  estimator  with  a sample  of 
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1,000  gives  results  equivalent  to  using  a sample  of 
900  — - still  a large  sample,'*)  Instead 

we  consider  estimators  of  f of  the  form 

t A A 

l = u + ypi8  , ( 0 aB ) 

A A 

where  u,  (3  are  estimators  of  the  two  parameters  u, 

(3,  that  Involve  the  fewest  possible  number  of  order 

statistics  x without  undue  sacrifice  in  efficiency 
sn 

as  computed  for  idenfinitely  large  samples*..  The 
aim  is  to  find,  with  a minimum  amount  of  computation, 

o A 

separate  unbiased  estimators  u,  {3,  of  the  parameters 

u,  (3,  each  of  which  has  minimum  variance  or  best 

efficiency  in  some  sense,  in  the  hope  that  the  linear 

combination  (0,8),  i^jhich  will  also  be  unbiased,  will 

turn  out  to  have  efficiency  which  is  not  unreasonably 

small*  ‘Aiis  is  a heuristic  method,  since  the  fact 
A a 

that  u and  B are  efficient  does  not  Imply  that  their 


These  considerations  assume  that  the  sample  of 
data  is  already  at  hand,  perhaps  by  a survey  already 
made,  such  as  the  Thunderstorm  Project  mentioned  in 
reference  18,  Of  course,  if  it  is  a question  of 
planning  for  the  securing  of  data,  it  is  desirable 
to  use  as  efficient  an  estimator  as  possible,  but 
in  that  case,  the  Investigation  will  rarely  be 
sufficiently  extensive  to  provide  samples  large 
enough  for  the  method  described  in  this  Appendix 
to  be  applicable. 
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A /\ 

combination  u 4-  ypP  is  efficient.  Much  better 
estimators  probably  exist,  but  we  are  content  to 
obtain  just  one  of  reasonable  efficiency. 

It  turns  out  that  the  modal  parameter  u can 
be  estimated  by  a si ngl e order  statistic.  E*  J, 

Gumbel  has  shown  (reference  ii,  equation  ( 5>0 ) ) that 
the  value  of  u for  which  x^n  best  (i.e,,  with  least 
variance  or  most  efficiency)  estimates  u is  ju  - .20319® 
For  simplicity,  we  therefore  replace  u in  (C.8)  by 

“ = x.20n  * (C<9) 

The  scale  parameter  p requires  at  least  two  order 

statistics,  or  rather  their  difference  x - x „ , 

fc/n  ^n 

for  estimation,  multiplied  by  a suitable  unbiasing 
factor  which  will  become  absorbed  in  the  expression 
for  b in  (C»7)®  A considerable  number  of  trials 
indicates  that  the  pair  of  values  } - .03,  *85 

gives  an  estimate  of  p with  efficiency  probably  close 
to  the  maximum,  if  not  actually  maximum.  Since  we 
are  not  seeking  very  precise  results,  this  pair  of 
values  is  adopted  here.  Thus  (C.l),  in  view  of 
(C.7)»  becomes 

i = x.20n  + °.32S6(yp+o.U7S9)(*.8Sn-*>03B).  (C.XO) 

The  variance  of  this  estimator  is  obtained  from  the 


. ■ .-■  . . . ■■'■■':• ..  • 5 ; -,i 


■ 

""  ' d 
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rule 

m m 2 2 m 

Var(  2 a.x.)  = 2 a.  a + 2 ° 2 a„a^  cov(x.ji.)» 

1-1  11  1=1  1 Xi  i,J=l  1 J 1 J 

i<j 


which  after  simplification  gives 

o2{ |)  - 8o6916  - O0O68I  d + lo 51^2  , (Coll) 

where 


a = 0.3256  7p  + 0.1549  . 

A 

Since  g is,  unbiased*  a measure  of  its  efficiency 

may  be  obtained  by  dividing  its  variance  into  the 

Cramer-Rao  lower^feouacL  Q,__  (see  equation  ( iq 0 1 9 ) and 

ho 

accompanying  text*  numerical  values  are  given  in  the 


Q,  column  of  Table  IH,  Part  A,  The  results  are  as 
follows g for  several  values  of  P of  interests 


P Efficiency  of  H 

79^  jSU5- 

°99  Jbij.9 

°999  0652 

1(  limiting  O660 

value ) 


Thus , this  1 arge~ sample  method  of  estimation  is 


slightly  less  than  two- thirds  efficient 0 However* 


as  noted  above*  such  apparently  low  efficiency  need 


not  be  a serious  matter  in  practice 0 


For  convenience*  a summary  of  the  method 


described  above  is  given  here 
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Summary  of  Large -Sample  Procedures 

1.  Arrange  all  n observations  (assumed  to  be 

independent  and  from  the  same  extreme -value  dis- 
tribution)  in  order  of  increasing  size*  and  then 
rank  them  from  1 to  n, 

2 0 By  hand  or  mechanical  sorting*  select  the  three 

observations  x whose  ranks  are  the  nearest  in- 

r 

tegers  to  GQ3n*  o20n*  and  085n0  Denote  these  by 
x«Q3n*  Xo20n*  x »Q$n° 

A 

3*  Compute  the  predicted  values  ^ * for  various  pro- 
bability levels  P*  by  formula  (Co10)o 
lj.«.  For  each  P cor^pute  the  variance  from  formula 

(caih 

5*  Take  the  square  root  of  the  variance  to  obtain 

the  standard  deviation*  This  gives  the  half-width 
of  the  68 “percent  confidence  b an d * since  for 

A 

large  samples  the  distribution  of  £ approaches 

normal! ty0  Sinilarly*  twice  the  standard  deviation 
determines  the  95  percent  confidence  band„  and 
2o58  standard  deviations  determines  the  99  par- 
cent  bando 

Go  Obtain  the  efficiencies  by  dividing  the  variance 
into  the  Cramer-Rao  lower  bound  in  Table  III,, 
Part  Ao 


■*  • • . ' 


. 
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APPENDIX  D 


ANALYSIS  OF  CONFIDENCE  INTERVALS  IN  ORDER-S TATI STICS 
METHOD  AND  METHOD  OF  MOMENTS  OF  GUMBEL 

(Related  to  Section  5*  Rule  5?  and  Section  6,2  of  text) 

1.  Confidence  intervals  in  order- 
statistics  method  (based  on 
normality  assumption)* 

In  the  texts  Section  5s>  Rule  5s  the  confidence 
intervals  given  for  various  confidence  levels  in  the 
proposed  method  are  obtained  by  laying  off  a certain 
number  of  standard  deviations 9 computed  for  the  estima- 

A 

tor  gp,  on  either  side  of  the  estimated  value  given 
by  the  fitted  line.  If  this  Is  done  for  different 
values  of  F and  the  ends  joined,,  as  in  Figure  4s  a 
confidence  band  is  obtained.  The  number  of  standard 
deviations  given  in  the  method  — one  for  a confidence 
level  of  68  percent , two  for  a level  of  95  percent  — 

A 

is  based  on  the  assumption  that  the  estimator  | Is 

P 

normally  distributed.  The  purpose  of  this  section 
Is  to  investigate  this  assumption  more  closely. 

It  will  be  recalled  that  the  estimator  ^ is 
obtained  by  splitting  the  sample  Into  a number  of 
equal  groups  with  perhaps  a remainder  of  different 
size  (see  text  In  connection  with  equations  (4*22) a 
(4.25)  and  (4.26)).  Then  2 can  be  written  :ion 
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(4.26) 

£ = t T + t ! - Ts  > 

where  T is  the  average  of  a certain  linear  function 
of  the  sample  variables  (equation  (4.22))  taken  over 
the  k subgroups , T>  is  another  linear  function  and 

A 

t,  t9  are  constants.  Thus  £ is  the  sum  of  two  parts: 

(1)  an  average  of  k independent  random  variables 
(tT^)'f  all  with  the  same  di stribution,  and  (2)  a 
single  variable  (t'T1)  with  a somewhat  different 
distribution.  By  the  Central  Limit  Theorem  in  prob- 
ability (reference  1,  page  2l£),  according  to  which 
the  average  of  a number  of  random  variables  having 
the  same  distribution  (with  first  two  moments  existing) 
is  asymptotically  normal  as  the  number  of  variables 
increases  indefinitely,  the  first  part  is  approximately 
normal  for  large  k.  In  fact,  extensive  experience 
has  shown  that  a normal  distribution  is  often  a re= 
markably  close  approximation  even  if  the  number  of 
variables  k is  under  10.  Furthermore,  the  first  two 
moments  (actually  all)  of  each  variable  certainly 
exist  ~=  in  fact  the  proposed  method  is  based  upon 

'These  variables  are  independent  because  the  subgroups 
were  assumed  to  be  formed  independently. 
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their  computed  values,  ^ence,  it  is  safe  to  say  that 
for  k = 10  or  more  the  first  part  is  very  closely 
normal o The  second  part  (t!T5)  is  a variable  which 
has  the  same  general  character  as  T (a  weighted  sum 
of  order  statistics  [equation  (4« 25)3)  and  hence  is 
believed  not  to  impair  significantly  the  approximate 

A 

normal! ty  of  V Its  influence  is  likely  to  be  small, 
especially  if  the  number  of  other  variables , k,  is 
large » 


For  samples  as  large  as  100,  k = 16  if  broken 
into  subgroups  of  6,  or  k = 20  if  broken  into  sub~ 
groups  of  5*  Since  these  values  of  k are  considerably 

4?'; 

larger  than  10,  the  preceding  discussion  shows  that  it  is 

A 

quite  safe  to  assume  normality  for  for  samples 
of  100  or  more,  so  that  the  corresponding  multiples 
of  the  standard  deviation  given  above  are  sufficiently 
accurate  in  such  cases.  In  fact,  it  is  likely  that 
the  normal  approximation  remains  good  for  practical 
purposes  down  to  samples  of  50  or  60„  becoming,  of 
course,  worse  as  sample  size  decreases  still  further. 
However,  in  the  absence  of  knowledge  about  the  exact 

A 

distribution  of  the  order=s tatis tics  estimator  £ 
for  smaller  samples,  the  normal  approximate 
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apparently  the  only  simple  one  available  for  de term- 
ing confidence  limits.  It  may  be  noted  that  approximate 
methods  are  also  involved  in  determination  of  confidence 
limits  in  the  Gumbel  method.  This  point  is  further 
discussed  in  the  following  section. 

A comparison  of  the  confidence  intervals  (or 
bands)  for  predicted  extremes  in  the  order-s tatistics 
method  and  in  the  Gumbel  method  is  of  interest  and  is 
presented  in  Table  VIII,  columns  2 and  4,  < and  7s> 

8 and  10.  It  Is  to  be  noted  that  for  prediction  prob- 
ability P not  beyond  .99  the  method  of  order-stati sties 
gives  an  appreciably  narrower  band  for  sample  sizes 
20  and  30,  and  a significantly  wider  one  for  samples  of 
10.  On  the  other  hand,  for  values  of  P much  beyond 
.99,  the  order  statistics  band  widens  very  rapidly,  as 
compared  to  a constant  width  for  the  band  in  Gumbel !s 
method.  However,  there  appears  to  be  some  question  as  to 
the  theoretical  validity  of  the  Gumbel  confidence-band  width 
for  large  values  of  P.  This  point  is  also  considered  in  the 
following  section. 


. ■’•  'V' 
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20  Confidence  intervals  for  largest 
extremes  in  Gumbel  methodo 


2 d Gumbel* s derivation  of 
confidence  intervals „ 


The  purpose  of  this  section  is  to  inquire 
into  the  theoretical  validity  of  the  confidence  in- 
tervals (or  confidence  band)  given  for  extreme  pre- 
dictions in  Gumbel* s method® 

In  the  Gumbel  method  the  68-percent  confidence- 
interval  half-width  for  the  largest  in  a sample  of 
n extremes  and  for  all  larger  predicted  values"'  iss 
in  Gumbel * s notation  (Table  VII s Part  Section  IV) 


Ax,n  - ^ = l-une*  f>  = i/a,  (D.X) 

where  (3  is  the  scale  parameter  (or  rather*,  an  estimate  of 
it)  of  the  extreme -value  distribution  from  which  the 
observations  are  assumed  to  come®  To  obtain  the 
confidence  Interval  for  a given  prediction  prob- 
ability P > n/(n+l),  the  value  A is  added  to*  and 
“ x j,  n 

subtracted  from®  the  estimate  given  by  Gumbel^  de~ 

1 

noted  by  him  by  x (Table  VII 9 Part  B*  Section  III) 


That  is5  for  all  values  of  P beyond  n/(n+l ) g which 
is  the  probability  assigned  to  the  largest  value 
in  the  sample s xn.  For  smaller  ?9  the  confidence 
interval  Is  given  by  a different  method  with  which 
we  shall  not  be  concerned  inasmuch  as  the  primary 
interest  is  in  large  values  of  P correspond?  ig  to 
extreme  predictions. 
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and  in  this  report  by  £ Gumbel 8 s (68  percent) 

G 

confidence  interval  for  'predictions  beyond  the  largest 

observed  extreme  x is  thus  given  by 

n o j 

tG  ± l.Uilp,  (D.2) 

where  (3  is  the  scale  parameter  (or  an  estimate  there- 
of) of  the  extreme -value  population  from  which  the 

e?£ 

observed  extremes  x have  been  assumed  to  comes 

F(x)  = J(y)  = exp(-e^)  s y=  (x-u)/(3  0 (D.3) 

The  multipier  l„lipl  used  for  the  68  percent  confidence 
band  is  obtained  by  setting  G = ,68  and  solving  for 
y the  equation  , 

J(y)  » J(-y)  - C (Do4) 

which  is  par ame ter -free 9 and  gives  y(C)  ” y(068) 

= 1«14073  (reference  page  6)0  Thus 

y = - 1 Ol4073  to  y = ld4073  (D„5) 

is  the  interval  for  the  reduced  variate  that  cuts 
off  (or  corresponds  to)  a central  area  of  068  under 
the  extreme-value  density  curve  shown  in  Figure  20 
The  corresponding  interval  that  cuts  off  the  same 

¥ From  the  theory  of  extreme  values  the  distribution 
of  the  largest  of  the  observed  value ss  xn?  in  a 
sample  of  n extremes,  is  exactly  an'  extreme -value 
distribution  that  has  the  same  scale  parameter  pc 
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area  under  the  original  (^unreduced®9 ) x-dis tribution 
thus  has  width  given  by  the  values  (D„5)  multiplied 
by  the  scale  factor  (3,  since 
x = u + yp 

The  half -width  is  therefore  1„  14073  ps  i0eop  (D01)«, 

The  following  discussion  indicates  that  this 
method  of  obtaining  confidence  intervals  is  inaccurate 
in  two  respects?  (1)  the  confidence  interval  ie  of 
constant  instead  of  increasing  width  for  large  P| 

(2)  the  scale  parameter  used  is  inappropriate® 

2,2  Constant  width  of 
confidence  interval 

The  above  me  thod‘  of  Gumbel  of  obtaining  con- 

, A 

fidence  intervals  (D„2)  treats  the  estimator  as 
though  it  has  an  ex treme^ value  distribution  with  the 
same  shale  parameter  p as  in  the  population  underlying 
the  observed  extremes  (including  the  largest 
extreme  xn)o  This  assumption  cannot  be  considered 
strictly  valid  since  it  implies  that  the  confidence 
width  remains  constant  for  all  large  values  of  Pp  as 
(Dol)  does  not  involve  P„  In  other  words,  this  asserts 
that  from  a sample  of  20  observations  or  e en  0P 
for  example p we  can  make 


\. 


- T d;  . 
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statements  about  events  that  will  occur  with  prob- 
ability  1 in  a million  or  billion  yet  have  the  s ame 

uncertainty  of  only  A in  our  estimate  for  x as 

x ^ n 

for  predictions  about  events  with  probability,  say* 

1 in  100o  It  does  not  seem  reasonable  that  a limited 
sample  can  tell  anything  at  all  meaningful  about  such 
extremely  rare  events,  let  alone  predict  them  with 
the  same  amount  of  uncertainty  no  matter  what  the 
probability  of  occurrence* 

This  lack  of  agreement  with  common  sense  indicates 

£ A a 

that  the  Gumbel  estimator  = u + ypP  cannot  be  treated, 
for  all  large  values  of  P,  as  if  it  has  an  extreme- 
value  distribution  with  constant  scale  parameter* 

Besides  these  common-sense  considerations, 
there  is  another  reason  why  the  ex treme -value  distri- 
bution is  not  appropriate  for  the  Gumbel  estimator, 
at  least  for  large  samples  of  data*  The  estimator 
is  a sample  characteristic  of  the  form 

tQ  = * + kp  sx  » £D°6) 

where  k is  a constant  for  given  values  of  P and  n. 

The  appropriate  distribution  of  such  an  expression 
is  given  by  a general  limit  theorem  in  probability 
(reference  1,  page  3^7^  to  the  effect  that  under  broad 
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conditions  any  sample  characteristic  based  on  moments 

(such  as  £ ) is*  for  large  values  of  n,  approximately 

normally  distributed*  Thus  for  large  values  of  n 

the  Gumbel  estimator  (D*6)  should  be  considered  to 

be  approximately  normal , with  variance  given  by  an 

2 

expression  which  increases  as  k^*  Moreover s this 
would  yield  a confidence  band  that  diverges  with 
increasing  Ps  avoiding  the  difficulty  of  the  parallel 
curves  mentioned  above* 


2*3  Inappropriate  scale  parameter 


Little  is  known  about  the  exact  distribution 

/\ 

of  the  Gumbel  estimator  particularly  for  small 
sample  sizes*  Yet  even  if  It  were  an  ex treme-value 
distribution  (of  the  form  (D*3))*  it  would  seem  that 
its  scale  parameter  would  not  be  p,  but  a certain 
multiple  of  its  B , found  below*  This  multiple  may 
be  determined  by  considering  the  relation  between 
the  variance  of  the  distribution  (assumed  extreme- 

A 


value)  of 

Lr 

tributions 


and  the  scale  parameter  (3^  of  this  dis- 


2 .£  N x2  2 

° V = T h 


(D.7) 
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But  we  have  an  approximate  expression  for  the  left 
side j,  namely,  (E„6)  in  Appendix  E below,,  This  is  of 
the  form 

°2(|g)  = q(yp)  ^ (o 08) 

where  p is  the  scale  parameter  of  the  original 
( extreme “Value ) x-dis tribution,  and  q(yp)  is  a 
quadratic  expression  in  the  probability  factor  yp 
with  coefficients  involving  the  quantities  o (s)  and 
covfyjs),  whose  computation  by  empirical  sampling  is 
indicated  in  Appendix  E ; q(yp)  may  be  regarded  as  a 
known  value,  depending  on  P0  Hence 

o2(t0)  = p2  . (D.9) 

Substituting  in  (D»7)  gives 

P = {—  Sp)  p = BpP  , (D.io) 

1 

which  defines  the  multiple  B^o  Thus  the  confidence^ 

interval  half<=width  (D 0 1 ) must  be  replaced  by 

A 9 = 1.141  Bpp  , (D011) 

where  now  A 9 is  no  longer  constant  with  P,  but 

on  account  of  B , actually  increases  very  rapidly 

P 

for  large  values  of  y^  corresponding  to  values  of 
P near  1.  Thus  we  obtain  a modified  confidence 
band  whose  divergence  states  that  the  amount  of  un= 
certainty  increases  without  limit  as  we  at temp 
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estimate  increasingly  improbable  events.  This  also 
avoids  the  conflict  with  common  sense  mentioned  in 
Section  2.2. 

Ihe  actual  values  of  Bp  are  of  interest  and 
are  given  in  the  following  table  for  several  important 
values  of  P and  for  the  three  sample  sizes  for  which 
tLsy  were  computed  in  Appendix  E. 


BP  35  SP  = ir 


p 

n = 10 

n = 20 

n = 30 

95 

.749 

.560 

.458 

99 

1.093 

.825 

.673 

999 

1.593 

1.208 

.986 

In  this  table  the  values  of  Bp  less  than  1 indicate 
that  the  modified  confidence  band  (equation  (D.ll)) 
is  better  (i.e.  narrower)  than  the  Gunbel  confidence 
band  and  vice  versa  for  the  values  of  Bp  greater 
than  1.  Thus,  the  modified  band  is  indicated  to  be 
considerably  better  in  the  region  P = *95  to  *99  for 
samples  of  20  and  30.  For  samples  of  10,  the  advantage 
is  less  at  P = .95  and  becomes  reversed  in  favor  of 
the  original  Gumbel  confidence  band  at  P = .99  , 
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The  above  comparison  remains  exactly  the  same 
for  any  other  confidence  level,  it  being  merely  necessary 
to  replace  1.1|1  in  equations  (D.l)  and  (D*ll)  by  the 
corresponding  value  y(C ) determined  from  (D„l|)«  Thus, 
for  the  95  percent  level,  y(,95)  = 3»06685  (reference 
6,  Lecture  3>  Table  3«l)o  At  each  level  the  con- 
fidence intervals  of  the  two  methods  are  affected  in 
the  same  ratio  by  such  multipliers,  that  is,  their 
ratio  to  each  other  remains  Bp,  regardless  of  confidence 
level  C* 

3 » Comparison  of  confidence  intervals  in 
the  Gumbel  method  and  the 
method  of  order  statistics 

Table  VIII  shows  the  actual  confidence  inter- 
vals (in  terms  of  the  scale  parameter  (3)  for  the  two 
levels  C = ,68  and  C = *95  for  the  Gumbel  method  and 
as  modified  by  the  factor  Bp,  and  also  compares  these 
(where  applicable)  with  the  intervals  given  by  the 
order-statistics  method.  Except  for  samples  of  10, 
for  which  the  Gumbel  interval  is  apt  to  be  narrower, 

the  modification  denoted  by  B , discussed  in  the 

P 

previous  section,  reduces  the  interval -width  for  P = ,99 
(and  less)  by  significant  amounts  — by  about  one- 
sixth  or  more  for  samples  of  20  (columns  8)  and  by 


' . ■ '■  1 ■ 


■ 


105 


about  one-third  or  more  for  samples  of  30  (columns  8, 
9).  These  results  are  of  course  implied  by  the  values 
of  Bp.,  given  in  the  preceding  section*  Also,  as 
noted  in  Section  1 above,  the  order-s tatistics  con- 
fidence interval  is  narrower  than  the  (unmodified) 
Gumbel  interval  in  many  cases,  for  P not  beyond  .99* 
and  sample  sizer  not  below  20.  However,  it  increases 
beyond  the  constant  Gumbel  width  for  larger  prob- 
abilities, in-  agreement  with  theoretical  requirements. 
At  P = .99  or  less,  there  are  two  additional  features 

to  be  noted.  (1)  With  increasing  confidence  level, 
numerical 

the/factor  in  the  Gumbel  interval  A increases 

xpn 

faster  in  either  the  modified  interval  A8  or  in  the 
order-statistics  interval  (denoted  by  A in  Table 
VIII ),  so  that  both  the  modified  method  and  the  order- 
statistics  method  reduce  the  confidence  interval  of 
the  Gumbel  method  by  constantly  Increasing  percent- 
ages as  the  confidence  level  increases.  For  example, 
for  P = .99  and  for  samples  of  20  the  order-statistics 
interval  is  about  11  percent  narrower  than  the  Gumbel 

t 

interval  for  a confidence  level  of  68  percent  and 
about  30  percent  narrower  for  a level  of  95  percent 
(columns  5j>  7)®  (2)  Similarly,  the  percentage 
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reduction  increases  with  sample  size.  Thus*  for 
P = .99  ana  a confidence  level  of  68  percent*  the 
reductions  are  11  percent  for  samples  of  20*  and  29 
percent  for  samples  of  30  (columns  8*  10)* 


APPENDIX  E 


DETAILS  OF  THEORETICAL  COMPARISON  BETWEEN  ORDER- 

STATISTICS  ESTIMATOR  AND  MOMENT  ESTIMATOR  OF  GUMBEL 

(Related  to  Section  6.1  of  text) 

Since  the  order-statistics  estimator  has  been 
fully  discussed  in  the  text,  the  remaining  problem 
in  making  the  above  comparison  is,  essentially,  to 
develop  the  characteristics  of  the  Gumbel  estimator* 

The  method  of  moments  of  Gumbel  In  present 
use  provides  the  following  estimators  for  the  param- 
eters u,  p (reference  18,  page  11,  equations  (26), 
(27)?  also  reference  5>  page  10*  equation  (29),  but 
read  ( » yn/a ) for  (+yn/a))  : 


where  x,  s are  the  mean  and  standard  deviation  of 

A 

the  given  sample  of  size  n;  y^  is  a certain  computed 
quantity,  depending  on  the  sample  size  n,  which 
approaches  Euler’s  constant  Y - .577^0.®  from  below 
as  n becomes  infinite;  and  a is  another  computed 
quantity,  depending  on  n,  which  approaches  %/  </E 
= 1.28255®°°  from  below  as  n becomes  infinite. 
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For  sufficiently  large  samples  the  quantities 



y and  a may  be  replaced  by  their  limiting  values", 
n n 

This  gives  the  somewhat  simpler  estimators,  for 
computation  purposes 


It  is  shown  below  that  the  net  effect  of  this 
simplification  is  to  diminish  the  bias  and  to  greatly 
understate  the  relative  efficiency  of  the  order- 
statistics  estimator  to  the  Gumbel  estimator.  Since 
the  asymptotic  form  (E.2)  involves  simpler  notation 
and  is  occasionally  used  In  practice,  it  has  seemed 
desirable  to  present  this  case  in  detail  below  (Section 
1)  and  also  in  the  main  text.  The  corresponding 
results  for  the  original  form  (E.X)  are  Indicated  in 
Section  2 below  and  tabulated  In  Table  ¥1. 

1.  Comparison  with  simplified 
Gumbel  estimator 

From  the  estimators  (E.2)  the  following 
estimator  of  can  be  built  up,  which  will  be  denoted 

by  iGi 

This  has  been  done,  e.g„,  in  reference  3s  pp®  18l 
ff.  an4_l88,  and  in  reference  18,  p.  10 „ 
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t3  = u>  + P’yp  = X + (yp  - /)  ^ Sx  . (E.3) 

This  is  a function  of  the  n sample  values  x^ , x^,  . 
xR  and  it  is  desired  to  find  its  mean  and  variance, 
and  thence  its  bias  and  efficiency. 

The  mean  is 


E(|g)  = u +/p  + (yp  -/)  ^ E(s)p  , 
which  can  be  rearranged  to  give 

E(|g)  = |p  + pf  E(s)-l](yp  -f)p  . (E.4) 

where  gp  - u + , E(x)  = u + and  E(s)  is  the 

expected  value  of  the  sample  standard  deviation,  s, 
when  the  sample  is  from  the  ^reduced*1  extreme=value 
distribution  exp(=e”^)e  Equation  (Eei|)  shows  that 
the  Gumbel  estimator  is  biased,  (unless  E (s)  = 'k/./E  for 
all  sample  sizes,  which  seems  highly  unlikely),  with  bias 

A 

un  b la sE  d~~e s t 1 mat o r analogous  to  is 

lo  = * + (yp  " f ) sx/e(s)  , 

for,  as  in  equation  (E«4) 

E (ia)  = u + / p + (yp  -()  E { s )p/E (s ) 

= u + Pyp  = lp 

However,  this  estimator  could  not  be  used  in  an 
actual  problem  since  E(s)  is  not  known.  Computation 
of  this  quantity  was  one  of  the  aims  of  the  IBM 
computing  procedures  discussed  in  the  text. 


r 


no 


b(S0>  = e(^G}  " 6p  = &T  E(s)-1]  (yp-/1p  . (e.5) 

A 

The  variance  of  the  estimator  is 

°2(^g)  = 4 + h + 2(yp^)!!7  oov(*’3x> 


^ TI  2 

where  o_  = ^ o (s)  is  the  variance  of  the  sample 
standard  deviation  for  samples  from  the  reduced  dis- 
tribution exp(-e”^);  and  cov(y,s)  is  the  covariance 
of  the  mean  and  standard  deviation  in  such  samples. 

A 


The  efficiency  of  could  be  evaluated  by 
suitable  generalization  of  equation  (2+  019 5 to  biased 
estimators.  The  variance  in  the  denominator  would 
be  replaced  by  the  MSE  (mean  square  error)."  The 
numerator  would  have  to  be  replaced  by  a complicated 
expression  which,  for  unbiased  estimators,  would 
reduce  to  0,LB«>  Instead  of  evaluating  efficiency  for 


the  biased  estimator  Ig*  therefore,  the  discussion 
will  be  greatly  simplified  by  limiting  it  to  relative 
efficiency.  The  relative  efficiency  of  one  estimator 


* For  discussion  of  mean  square  error  see  equation 
O4..I3)  and  accompanying  text. 


' 


- 


jj  ! 


' - - . . 


Ill 


(T^)  to  another  (T2)  Is  defined  as  the  ratio  of 
mean  square  errors, 

MSE  (t2) 

H (Tx,T2)  = 


(E.7) 


MSE  (Tx) 

This  ratio  has  been  used  as  an  index  of  comparison 

of  two  estimators  (e.g.  reference  7)0  Thus,  the 

relative  efficiency  of  the  order-s tatis tics  estimator 

A 

Cp  to  the  Gumbel  estimator  £ is,  by  (4«13)  and  the 
fact  that  the  former  estimator  is  unbiased, 

a a MSE  (|Q)  a2(|  ) + (bias(L))2 

= MSE  (f  ) 27 J~ 

o (ftp) 

(E.6)  + (E.5)2  

= (lA)Qm . (E-8) 

where  k is  the  number  of  subgroups  of  size  m into 
which  the  sample  of  n is  par ti tioned  (equation 
(^.,23),  assuming  there  is  no  remainder  subgroup),  and 
the  expressions  needed  for  the  numerator  are  given 
by  the  equation  numbers  indicated. 

The  key  quantities  needed  in  the  calculation 
of  relative  efficiencies  are,  from  equations  (E.5) 
and  (E.6),  E(s),  o (s),  and  cov(y,s).  For  general 


Thus,  n = 10  = 2 x 5 gives  k = 2,  m - 5l  n = 20  = 
4x5  gives  k = 4>  m = 5>  n = 30=5  x 6 gives  k = 5# 
m = 6. 
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sample  size  n,  their  exact  values  are  given  by  mul- 
tiple integrals  whose  evaluation  would  apparently 
require  a prohibitive  amount  of  labor „ Instead , the 
following  method  of  empirical  sampling  was  used  with 
the  aid  of  IBM  calculating  and  tabulating  equipment. 
The  universe  of  (reduced)  extreme  values  |)(y) 
= exp(-e  was  approximated  by  constructing  a pop- 
ulation of  12,000  suitable  random  numbers  and  punch- 
ing each  number  on  an  IBM  punch  card.  These  were 


then  mechanically  separated  into  1200  random  samples 
of  size  n = 10  and  for  each  sample  the  mean  y,  standard 
deviations,  and  their  product  ys  were  obtained. 

This  was  equivalent  to  having  a Mpopulationwof  1200 
means,  one  of  1200  standard  deviations,  and  one  of 
1200  products  of  the  mean  and  standard  deviation. 

It  was  then  assumed  that  the  arithmetic  mean  of  each 
of  the  three  populations  would  be  a close  approximation 
to  the  mathematical  expectations  (averages)  of  the 
desired  quantities,  so  that  these  approximations 
could  be  taken  as  estimates  of  the  moments 'E(s)s 
E(ys),  From  these  values,  and  the  relation 


y-,  / 2 \ _ n-1  2 

E(s  > = — y 


n-1  it 

~rT  T 


. : : .nr  . V-v  . ■'  v , ; ‘ 

- V;  ■ ■ n.'  ‘i; 

' . 

~ r-.  t:  ; : ,••• ..  ■ 


y-r ••  ;•  ■;  . ■.  -f 

< 1 . - . . ->  ; ' r ......  . :>■  ■ > 
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the  variance 

2 

a (s)  = EC's2)  - [E  (s  ) ] “ = “ fE  ( s ) 

was  computed*  and  also  the  covariance 

cov(y„s}  = E(ys)  - E(y)E(s)  = E ( ys ) - Ye(s), 

2 p 

The  five  quantities  E ( y) , o (y) , E(s),  c^(s)*  and 
cov (y,s)  are  shown  in  Table  IX * together  with  the 
corresponding  theoretical  values  that  can  be  readily 
calculated . 

In  actual  use  this  procedure  was  modified  some- 
what*  since  only  one  value  of  each  of  the  desired 
quantities  would  be  produced  by  the  12,000  cards  and 
1200  samples.  This  single  value  would  be  subject 
to  the  fluctuations  of  random  sampling  and  would  be 
difficult  to  rely  on  in  making  inferences 0 This 
difficulty  was1  met  by  breaking  the  npopul  at  ion®*  of 
1200  samples  into  12  sets  of  100  samples  and  obtaining 
12  values  of  each  of  the  desired  moments  instead  of 
only  one.  These  12  values*  although  each  was  based 
on  fewer  samples*  served  to  furnish  an  idea  of  how 
the  single  value  based  on  1200  samples  was  affected 
by  sampling  variation*  Such  analysis  has  provided  a 
far  firmer  basis  for  judgement  of  relative  efficiency* 


H4 


■ The  above  procedure  resulted  in  moments  cal- 
culated for  samples  of  size  n = 10.  In  like  manner, 

600  random  samples  of  size  n = 20  were  drawn,  after 
starting  afresh  by  putting  all  12,000  cards  together, 
but  this  time  only  6 instead  of  12  sets  of  100  samples 
were  available,  resulting  in  6 values  of  the  desired 
quantities  for  comparisoa  Finally,  the  12,000 
cards  were  reprocessed  to  yield  400  samples  of 
size  n = 30,  giving  4 values  each  based  on  a set  of 
100  samples. 

The  resulting  sets  of  12,  6,  and  4 values  each 
were  substituted  in  the  appropriate  formulas  (Eo5)» 

(E06)®  (E.8)  in  order  to  obtain  the  relative  efficiency 
of  the  order-statistics  estimator  to  the  (simplified) 
Gumbel  estimator.  These  formulas,  all  of  which  depend 
upon  yp,  were  evaluated  at  the  probability  level  P = .95* 
All  these  results  are  summarized  In  Table  ¥ which  shows 
the  values  of  the  bias,  mean  square  error,  and  relative 
efficiency  calculated  for  each  set  of  100  samples  of 
sizes  10,  20,  and  30,  together  with  the  corresponding 

N 

average  values  obtained  from  all  1200  samples  combined. 
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For  ease  of  comparison,  the  relative  efficiencies 
are  also  charted  in  Figure  6® 

These  results  constitute  the  basis  of  the 
statement  in  the  text  that  at  the  probability  level 
P = ,95,  for  samples  of  20  and  30,  the  proposed  method 
has  greater  efficiency  than  the  Gumbel  method  using 
the» simplified  estimator,  while  for  samples  of  10 
the  efficiencies  are  about  the  same® 


' -• 


■ 


20  Comparison  with  original 
Gumbel  estimator 
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The  estimator  corresponding  to  £ ^ in  (E„3)p 
built  up  from  the  estimators  (E„l)  is 


A 


£0,„  = u + PyF  = x + knsx 


(E.9) 


where 


kn 


VTP 


a 


n 


d 


(E.10) 


Here 

d = A 

and  _ %(JE  yp-yn 

°n,P  cr-  ’ (E.ll) 

is  the  conversion  factor  for  passing  from  the 
multiplier,  d,  of  sx  in  (E.3)  to  kn  in  (E„9) ° It 
is  apparent  from  the  discussion  at  the  beginning 
of  this  Appendix  that  for  infinitely  large  n,  b.  p =1, 
so  that  (E.9)  Includes  the  asymptotic  case.  For 
finite  n,  however,  on  < yn  < ff'' * Hence,  bn  pi, 

being  a product  of  two  factors  each  greater  than  1, 
may  considerably  exceed  1,  so  that  the  multiplier 
bn  p in  (E.10)  and  (EJ1)  becomes  appreciably  larger 
than  the  multiplier  in  (E„3)°  Thus,  for  samples  of 
10,  20,  30*  computation  shows  that,  for  P = „95>s 


for  example. 
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k 


10 


= 1.397  d 


k2Q  = 1.234  d 


k3Q  = 1.173  d 


e 


(E  * 12 ) 


The  bias  of  H is,  i 

G,n 

Section  1,  in  view  of  (Eo10)s 


in  manner  similar  to 


b(|G>n)  = t-(yp-/)  - knE(3)]p 

= Vtf>  E(s)bn»P  " 1]P 


(E.13) 


Table  VI  (columns  2 and  3)  indicates  that  the  presence 
of  the  factor  bn^p  converts  the  small  negative  biases 
into  larger  positive  ones. 

For  the  variance  we  have  from  (E.9)*  analogously 
to  Section  1, 


The  corresponding  expression  (E.6)  may  be  written 


Comparison  of  these  two  expressions  shows,  since 
cov(y,s)  was  found  to  be  positive,  that  replacement 
of  d by  the  larger  value  kn  considerably  increases 
the  variance  of  the  Gurribel  estimator.  Values  of  the 
variance  for  the  original  and  simplified  estimators 
are  listed  in  columns  II  and  5 of  Table  Vl„ 


; ■ 

' • 

■■  • = ..  . fj: 5 
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Comparison  of  these  columns  indicates  that  the  variance 
of  the  original  estimator  can  become  more  than  half 
again  as  large  as  the  variance  of  the  simplified 
estimators,  depending  on  sample  size  and  probability 
P„  The  effect  is  most  marked  for  the  lower  levels 
of  P and  smaller  sample  sizes  and  disappears  as  shown 
when  both  these  factors  increase,, 

The  result  of  the  above  increases  in  bias  and 
variance  is  to  greatly  increase  the  mean  square  error 
(columns  6 and  7, Table  VI),  and  thus  to  increase  the 
relative  efficiency  of  the  order-s tatistics  estimator 
(columns  8,  9)«  As  a result  the  order-statis tics 
estimator  is  up  to  twice  as  efficient  as  the  original 
Gumbel  estimator  even  for  samples  as  small  as  10 « 

This  tremendous  increase  in  efficiency  falls  off 

i 

slowly,  as  shown,  when  sample  size  increases.  For 
fixed  sample  size  the  efficiency  increases  for 
large  values  of  P„  These  differences  In  efficiency 
are  sufficiently  large  to  completely  outweigh  any 
fluctuations  of  random  sampling  attributable  to  the 
empirical  sampling  method  of  evaluation  usedo 

It  must  be  conclude^  therefore,  that  the 
original  Gumbel  estimator  is  both  more  biased  and  much 
less  efficient  than  its  simplified  form,  A '"suit. 


T 
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comparison  of  the  order-statist ics  estimator  with 
the  original  Gumbel  estimator  gives  very  conservative 
results  and  greatly  understates  the  actual  improve- 
ment  in  efficiency  of  the  proposed  method  over  the 
method  in  present  use„ 
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No. 

1 

2 

3 

1* 

5 

6 

7 

8 

9 

10 

n 

12 

13 

1U 

IS 

16 

17 

18 

19 

20 

21 

22 

23 
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ESTIMATION  OF  EXTREMES 
(For  instructions  see  Section  5) 

Worksheet  1 - Determination  of  Estimators 
! NACA  - Langley  — Sample  III 


Computers  J.  1. 
Dates  5/29/52 


I. 


SUBGROUP  SIZES  AND  PROPORTIONALITY  FACTORS: 
n = 23  = km  + m'  t = km/n  = 0.78261 

= 3x6+5  t2  A = 0.20lil6 


k = 3 m = 6 


m'  = 5 


t'  = m'/n  = 0.21739 
t'2  = O.Oii726 


MAIN  SUBGROUPS: 

Weights  a^s  b^ 
i = 1 2 

(from  Table  I) 
3 1* 

5 

6 

Check 

sum 

a±  = . 355U5 

.2251*9 

.16562 

.12105 

.08352 

.01*887 

1 

\ = -.1*5928 

-.03599 

.07319 

.12673 

.11*953 

.11*581 

- .00001 

Cfoservations  x.  in  increasing  order 

from  i =>  1 

to  i = 

TO 

Subgroup  x. 
No.  1 

x2 

x3 

*1* 

x5 

x6 

Check  sum  2a  ^ Sb^x.^ 

1 .75 

.81 

.90 

1.08 

1.20 

1.38 

6.12 

.89669  .20978 

2 .75 

.80 

.88 

.90 

1.08 

1.20 

5.61 

.85052  .11*168 

3 .98 

1.00 

1.02 

1.15 

l.3l 

1.1*3 

6.89 

1.06127  .13870 

li 

5 


k 

Sum  2.I18  2.61  2.80  3.13  3.59  1*.01  18.62  2.808I18  .!|9016 


T = SaixiA  + (2b1x1A)yp  = .93616  + .16339  yp 

IIB.  REMAINDER  SUBGROUP: 


Weights  a^,  b^  (from  Table  I) 


1 

2 

3 

1* 

5 6 

Check  sum 

.1*1893 

.21*628 

.16761 

.10882 

.05835 

.99999 

-.50313 

.00651* 

,1301*5 

.18166 

.181*1*8 

0 

Observations  in 

increasing  order 

from  i =1  to  i =m' 

x2 

x3 

xl* 

„!  „l 

x5  x6 

Check  sum 

Za’x1 

“ixi 

aJ*i 

.75 

.93 

1,01 

1.15 

1.16 

5.00 

.90535 

.1831*0 

T'  = 2a^  + (7bjx[)yp  = .90535  + .l83l|0  yp 

III.  ESTIMATORS: 

6 =t  f + t'T'  = .9291*6  + .1677U  yp 
u = ,929l|6  , p = ,l677ll 
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ESTIMATION  OF  EXTREMES 

Worksheet  2 - Predicted  values,  confidence  band,  efficiency,  plotting  positions 


p 

y 

P 

PREDICTED  VALUES 
Ir  = .929h6+.l677iiyp 

Cm*  = «5 

cj 

CJ 

<3"  $ 
I 

II 

68?-C0NFIDENCE 
BAND  HALF-WIDTH 

= VVar(£p) 

Q =Q  /n 
LB  ° 

(Q0  from 
Table  III) 

EFFICIENCY 

(from  T 

able  m) 

E- 

Var(fp) 

(1) 

(2) 

(3) 

(U) 

(5) 

(6) 

(7) 

(8) 

(9) 

t2/k  = .2Ql*l6 

t'2=.0l*726 

p = .16771+ 

Estimate 

.36788 

of  u: 
0 

.9291*6  g 

.19117  p2 

.2311*0  p2 

.01*997  p2 

.0375 

.01*820  p2 

.965 

.50 

0.36651 

.99091+  g 

.23189  p2 

.27870  p2 

.06051  p2 

.01*13 

.05991*  p2 

.991 

.90 

2.25037 

1.3069lt  g 

1.00065  p2 

1.22831  p2 

.26231*  p2 

.0859 

.23235  p2 

.886 

.95 

2.97020 

l.lt2768  g 

1.51*171 

1.9031*9  p2 

.1*01*71  p2 

.1067 

.31*777  P2 

.859 

.99 

It.  60016 

1.70109  g 

3.27230  p2 

1*. 07062  p2 

.8601*5  p2 

.1556 

.71035  p2 

.826 

.999 

6.90726 

2.C8808  g 

6.9201*1*  p2 

8.65173  p2 

1.82176  p2 

.2261* 

1.1*6361*  p2 

.803 

Estimate 

1 

of  p: 

— 

0.13196y2p2  | 

0.l6665y2p2 

0.03l*82y2p2 

— 

0.0261*3y2p2 

.759 

PLOTTING  POSITIONS 

Observed  extremes  in  increasing  rank  from  1 to  n = 23 


Rank 

Observed 

Plotting  Position 

Rank 

Observed 

Plotting  Position 

Rank 

Observed 

Plotting  Position 

r 

Extreme 

r/(n+l) 

r 

Extreme 

r/(n+l) 

r 

Extreme 

r/(R+l) 

1 

.75 

.01*17 

n 

1.00 

.1*583 

21 

1.31 

.8750 

2 

.75 

.0833 

12 

i.ea 

.5000 

22 

1.38 

.9167 

3 

.75 

.1250 

13 

1.02 

.51*17 

23 

1.1*3 

.9583 

i* 

.80 

.1667 

li* 

1.08 

.5833 

Sum 

23.62 

5 

.81 

.2063 

15 

1.08 

.6250 

6 

.88 

.2500 

16 

1.15 

.6667 

7 

.90 

.2917 

17 

1.1S 

.7083 

8 

.90 

.3333 

18 

1.16 

.7500 

9 

.93 

.3750 

19 

1.20 

.7917 

10 

.98 

.1*167 

20 

1.20 

.8333 

- ■ r 
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Weights  for  mnimum -variance  s unbiased,  linear  order-statistics  estimator 
^ n 

of  percentage  points  £ p = u + (3  y 5 and  variance 
P i=X  1 1 P x - P P 

^ 

Var(Cp)  ” - (A^rj  + B^.p  + Cn)£  , for  sample  size  n = 2 to  6S  Xj4  x € •••4Xq 
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Limiting  efficiency  as  P approaches  1©  These  are  also  the  efficiencies  for  the 
estimator  of  the  parameter  p0 


TABLE  IV 
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Efficiency  of  orders  statistics  estimators  for  various  sample  sizes 
n = km+ra'  partitioned  into  subgroups  as  indicated  for  P = *99  s P=1 


(Discussed  in  Section  k*b) 


n 

km  + m' 

EFFICIENCY 

n 

km  + mfl 

EFFICIENCY 

P = o 99 

P - 1 

P = *99 

P = 1 

(percent ) 

(percent ) 

2 or  k°2 

5^o  o 

£*2*7 

21 

3x6  + 3 

8O08 

73-6 

3 or  k°3 

68*7 

58-8 

22 

3x  6 + 1* 

8l®8 

74-9 

h or  k®ii 

75.9 

67-5 

23 

3x  6 + 5 

82o6 

75.9 

5 or  k»5 

80.3 

73o0 

24 

bxS 

83®2 

76-8 

6 or  k«6 

83®  2 

76o8 

25 

5x5 

80®3 

73oO 

7f 

1x5  + 

2 

70-5 

60o7 

26 

l*x  6 + 2 

79  o9 

72-3 

8 v" 

1x6  + 

2 

73o3 

63-8 

27 

kx  6 + 3 

81o3 

74.3 

9 

1x6  + 

3 

77®7 

69  ol 

28 

1*  x6  + U 

81  o9 

75.3 

10 

2x5 

80o3 

73oO 

29 

kx  6 + 5 

82o? 

76ol 

30 

5x6 

83®2 

76o8 

11 

1x6  + 

5 

8lo9 

75oO 

31 

5x5  + 6 

80  e 8 

73o7 

12 

2x  6 

83o2 

7608 

32 

5'x  6 + 2 

80o5 

73*1 

13  c 

2x5  + 

3 

77®3 

69  el 

33 

5x  6 + 3 

8l®6 

74-7 

Ilf.  S 

2x6  + 

2 

7U.7 

66e? 

3U  . 

5x  6 + 4 

82o3 

75-6 

15 

3x5 

80-3 

?3e0 

354 

7x5 

G0o3 

73«0 

16 

2x6  + 

U 

81®  3 

7U-2 

36 

6x6 

83®2 

7608 

17 

2x6  + 

5 

82o3 

75-6 

37 

7x5  + 2 

78o2 

70-3 

18 

3x6 

83®  2 

7608 

38 

6x6  + 2 

80o9 

73-7 

19 

3x5  + 

h 

79®3 

71*7 

39 

6x6  + 3 

81  o9 

75oO 

20 

kx  5 

80®3 

?3«0 

liO^ 

8x5 

80®3 

73o0 

61 

11x5  + 6 

8O06 

73®3 

If  partition  is  7 = lx  h + 3S  then  efficiencies  ares 

P = o 99,  effo  = 72® 7^1  P = 1,  eff„  = 63*k% 
v If  partition  is  8 = 2xk$  then  efficiencies  ares 

P = *99,  eff®  = ?5®9$|  P = 1,  effo  = 67*5% 
^ If  partition  is  14  = 2x  5 + then  efficiencies  ares 

P = .99,  effo  = 79d%$  P = 1,  effo  = 71  *3% 
v If  partition  is  35  - 5x  6 4-  5.»  then  efficiencies  ares 

P = -99*  effo  = 82o8#s  P = 1,  effo  = 16*2% 
If  partition  is  UO  = 6 x 6 + then  efficiencies  are  s 

P = o 99,  effo  = 82 M°s  P = 1,  effo  = 75-7$ 
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TABLE  VII  - PART  A 
PROBABILITIES  OF  EXTREMES 

(Discussed  in  Section  6.2) 
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Extremes 


Frequency 

P 


Meon  ond  Standord  Deviaiion 
x ■ p x2  • p 


Cumulative  Frequency 
m 


Plotting  Positions 
m/(N+l) 


■ 75 

3 

— to 

— f0 

0.OS33  t0 

‘ 75 

.? 

O.  /250  „Q 

• $Q 

V 

O . / <£  (a  7 t0 

■Si 

5 

• *S 

£, 

0.2 SCO  f0 

■9o 

7 

0. 29 1C,  t0 

■ fo 

* 

0.3333  tc 

■93 

9 

0.37 SO  f0 

■ 9%  ~ 

/o 

0. 4/6,7  t0 

/ . oo 

II 

0.4 or S3,. 

1.51 

/z 

0.S0OO  t0 

/■02 

13 

o.s4/c>  t0 

/.of 

/4 

0.5*33  t0 

/.of 

/s 

0.6,250  tr 

I./5- 

1L 

0.6,60 7 t0 

/./S' 

/ 7 

0.7CS3  t0 

1 » / G 

/S 

0.750  0 t0 

/.  20 

19 

0.79/7  t0 

/.  20 

10 

0.S333  t0 

/.3! 

2/1 

o.fiso  t0 

t.3f 

22 

<?.  <7/4  7 t0 

1.43 

23 

0.95*3  t0 

— to 

to 

Sums:  N = 3,  3 

Computer:  7M/P.T/ 


336  3. 


X(xp) 


Arbitrary  Mean: 


Date  Data:  NACA  - Zfl^/gy 


g M. 
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TABLE  VII  - PART  B 

PROBABILITIES  OF  EXTREMES 

(Discussed  in  Section  6.2) 

I.  Mean  and  Standard  Deviation: 

N = H ^ I ( x p ) = <2.3 . <g  Z 

VTT  = 4-  7?S a 3 Mean:  x'  = Ll  03  70 

Arbitrary  Mean:  x0  = Q. 

True  Mean  7 = LJ2JL2JL 

n/(N-1)  = 

n.  Parameters! 

cr„  -- J.at/i 


I (x*p)  «. 


2S.  13  2 a 


U0937 


(x'r  = = 

— <sx)’  = — 

Standard  Deviation:  s„  = _ 


/'  05-41 


o ■ dSXo 


a • I9& o 


Q.S3S3 


l/a=s,/a„  = 0 • ItO  4- 


l/(aVrN)=(l/a)/vrN  = Q .0376, 


TTT  Line  of  Expected  Extremes: 

x=  u!  (l/a)y  = 0-  73!  7 

y:  -2.00  0.00 

y ( 1 /a ) : ~ • 36,03  0 00 

>5709  - u = > 93  n 


3.00 

s4  / 2 


1-4729 


: n/a)  = 6-  a 3 

*V  (l/at  = o-  93/  7 

- mode 

V'  NOTE 

Upper  sign 
lower  sign 

used  for  maxima, 
for  minima 

o . !%o4 

5.00 

2.25 

4.60 

.9030 

4059 

. X2  9* 

1,333  7 

NOTE:  Volues  x,0  and  xl00  are  for  return  periods  of  10  and  100  — 


Ha 

f-width  of  0.68269  Confidence  Band,  <rxm 

crj  m Vn/ (aVTT)  = 

(O-^VN)  [(I/o)/-/n|: 

4>  (x)  : .150  .200  .300 

.400 

.500  .600 

.700 

.800 

.850 

O’y.n 

n VTJ : 1.255  1.243  1.268 

1.337 

1.443  1.598 

1.835  . 

2.241 

2.585 

.0472  -04U  7 ,0477 

.0503 

.0543  .06,01 

.06,90 

0*43 

.0972 

For 

largest  value,  AX_M  = 1.141  (l/a)  = 

o.  sort 

For 

next  - to  - lorgesf  value,  Ax„., 

= .759  [N /(N- 

1)]  ( 1 /a)  = 

0.  13  6 9 

Expected  Extreme,  in  T periods  (years, 

etc.):  XT  = 

*10  + (*100  “ *lo)  : 

*100“  * 10  ' 

T 

ZT(XK30"X,0)  XT 

T 

Zr  Z 

t(*ioo~*io)  *t 

T ZT 

^*IOO~*lo) 

X r 

1 5 

1 ft  o 

60 

.781 

140  1.144 

20 

30  6 

70 

.847 

150  l.l  73 

25 

404 

80 

.905 

200  1.2  96 

30 

4ft  3 

90 

.955 

300  1 .4  69 

35 

54  9 

100 

1.000 

400  1.592 

40 

607 

1 10 

1.041 

500  1.687 

45 

65ft 

120 

1.0  78 

750  1.8  59 

50 

.703 

130 

l.l  1 2 

1000  1.990 

ce  : 

a : 

NAC.A  — Langley 

SamhJe.  No.  J1L 

Computer : 


3l.fi  li 


Da  te : 


S'//3/s2 
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Empirical  sampling  values  of  first  and  second  moments  of  sample  mean  y 
and  standard  deviation  s for  samples  of  n - 10s  20 s 3-0 » compared 
with  the  corresponding  theoretical  value  where  obtainable 
(Discussed  in  Appendix  E) 


Estimate 

n - 10 

(1200  samples) 

n - 20 

(600  samples) 

n = 30 

(1*00  sables) 

of 

Empirical  Theoretical 

Empirical  Theoretical 

Empirical 

Theoretical 

values  values 

values  values 

values 

values 

E(y) 

0*5698  0,5772 

0*5698  0*5772 

0,5698 

0*5772 

o*C?) 

0*1663  0*161*5 

0*0881*  0*0822 

0*0535 

0o  051*8 

E(s) 

1*1656 

1*2211 

1*21*59 

a2(s) 

0*1321 

0,0775 

0,0513 

cKy5§) 

0e  0800 

0e  01*38 

0,0297 
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Figure  3.  - Comparison  of  efficiencies  of  order-statistics  estimator  | for 
pies  of  sizes  2,  3,  4,  5,  6,  or  for  samples  of  any  size  if  broken  into 
equal  subgroups  of  2 to  6.  (Data  from  Table  III,  Part  B.J 
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Figure  4*  “ Graphical  analysis  of  a sample  of  23  maximum 
acceleration  increments  by  method  of  order  statistics 
(Data  from  Worksheet  2) 
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The  functions  of  the  National  Bureau  of  Standards  are  set  forth  in  the  Act 
of  Congress,  March  3,  1901,  as  amended  by  Congress  in  Public  Law  619,  1950. 
These  include  the  development  and  maintenance  of  the  national  standards  of 
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materials,  devices,  and  structures;  advisory  services  to  Government  Agencies  on 
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